Back to Catalog

Generative AI Engineering with Fine Tuning Transformers

Learn on

Coursera logo
IntermediateCourse

This course provides you with an overview of how to use transformer-based models for natural language processing (NLP). In this course, you will learn to apply transformer-based models for text classification, focusing on the encoder component. You’ll learn about positional encoding, word embedding, and attention mechanisms in language transformers and their role in capturing contextual information and dependencies. Additionally, you will be introduced to multi-head attention and gain insights on decoder-based language modeling with generative pre-trained transformers (GPT) for language

4.5 (53 Reviews)

Language

  • English

Topic

  • Artificial Intelligence

Industries

  • Artificial Intelligence

Enrollment Count

  • 6.69K

Skills You Will Learn

  • Fine-tuning LLMs, LoRA And QLoRA, PyTorch, Hugging Face, Pre-trainingTransformers

Offered By

  • IBMSkillsNetwork

Estimated Effort

  • 3 weeks, 2 hrs

Platform

  • Coursera

Last Update

  • May 18, 2025
About this Course
Welcome to this course on Generative AI Language Modeling with Transformers.
In your generative AI journey, this course is a first step toward the fundamental concepts of transformer-based models for natural language processing (NLP). This course is ideal for learners who wish to apply transformer-based models for text classification, specifically focusing on the encoder component.
Let’s get into the details.
This course is part of a specialized program tailored for individuals interested in Generative AI engineering. In this course, you will explore the significance of positional encoding and word embedding, understand attention mechanisms and their role in capturing context and dependencies, and learn about multi-head attention. You'll also learn how to apply transformer-based models for text classification, explicitly focusing on the encoder component. Finally, you will learn about language modeling with the decoder mini-GPT.
You will get hands-on opportunities through labs based on attention mechanism and positional encoding, applying transformers for classification, and using transformers for translation. You will also build decoder GPT-like models and encoder models with baby BERT.
After completing this course, you will be able to:
  • Apply positional encoding and attention mechanisms in transformer-based architectures to process sequential data.
  • Use transformers for text classification.
  • Use and implement decoder-based models, such as GPT, and encoder-based models, such as BERT, for language modeling.
  • Implement a transformer model to translate text from one language to another.
Who should take this course?
This course is suitable for those interested in AI engineering, such as Deep Learning Engineers, Machine Learning Engineers, and Data Scientists. It includes creating, optimizing, training, and deploying AI models to transform. It is specifically designed for those who want to learn about NLP-based applications, data science, and machine learning.
Recommended background
As this is an intermediate-level course, it assumes you have a basic knowledge of Python and PyTorch. You should also be familiar with machine learning and neural network concepts.
Course content
This course is approximately 7 hours long and divided into two modules. You can complete one module weekly or at your own pace.
Week 1 - Module 1: Fundamental Concepts of Transformer Architecture
In module 1, you will learn positional encoding, which consists of a series of sine and cosine waves. This enables you to incorporate information about the position of each embedding within the sequence using PyTorch. 
As you delve into this module, you will explore the concepts of self-attention mechanisms that employ the query, key, and value matrices. You can apply the attention mechanism to word embeddings and sequences. This process helps capture contextual relationships between words. You will also learn about the language modeling and self-attention mechanisms that generate the query, key, and value using the input word embeddings and learnable parameters.
Additionally, you will explore scaled dot-product in attention mechanisms with multiple heads and how the transformer architecture enhances the efficiency of attention mechanisms. You will also learn to implement a series of encoder layer instances in PyTorch. 
As you delve into this module, you will learn how transformer-based models are used for text classification, how to create the text pipeline and model, and how to train the model.
Week 2 - Module 2: Advanced Concepts of Transformer Architecture
In Module 2, you will explore decoders and their use of decoder models for text generation. Further, you will learn the architecture of decoder-only Generative Pretained Transformers (GPT), the training process of GPT models, and how the model is useful for generating text. 
As you progress in this module, you will learn about bidirectional encoder representations from transformers (BERT) that use an encoder-only architecture. Additionally, it allows the processing of entire text sequences simultaneously, theoretically enhancing its understanding of the context and nuances within the text. However, masked language modeling (MLM) involves randomly masking the input tokens and training BERT to predict the original masked tokens.
You’ll also learn to train BERT models using next-sentence prediction (NSP) tasks and how to fine-tune BERT for a downstream task. However, this module leverages data preparation for preprocessing using tokenization, masking, and creating training-ready input for BERT-specific training tasks such as MLM and NSP.
This module enables you to explore the encoder and decoder transformer architecture and explain how the model can be used to generate translations. Further, you will use an encoder-decoder model for translation tasks using PyTorch and then train the model to generate German-to-English translations.
The module includes a cheat sheet with quick reference content, such as code snippets. The summaries in each lesson will help you revisit the concepts you learned through videos. A glossary will help you review the technical terms used in the course. The modules conclude with a final graded quiz.
Please note: When you practice the lab exercises, you should ensure you have well-prepared data by removing unwanted characters, symbols, etc.
Special Note: This course will focus on the architecture of Generative Transformer Models. You will understand how to apply the concepts for training these models, but in reality, training GPT like models requires intensive compute resources, huge volumes of data and a significant amount of time, which cannot be replicated in this course. Also, note that the lab environment provided in this course does not have sufficient resources to adequately train models such as GPT and BERT.  Therefore, you have been provided with pre-trained models that you can run in the lab environment. However, if you have GPU based machine and high RAM you will see better results trying to train the models in your own systems. 
We wish you good luck completing the course and getting the most out of it!

Instructors

Joseph Santarcangelo

Senior Data Scientist at IBM

Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Read more

Fateme Akbari

Data Scientist @IBM

I'm a data-driven Ph.D. Candidate at McMaster University and a data scientist at IBM, specializing in machine learning (ML) and natural language processing (NLP). My research focuses on the application of ML in healthcare, and I have a strong record of publications that reflect my commitment to advancing this field. I thrive on tackling complex challenges and developing innovative, ML-based solutions that can make a meaningful impact—not only for humans but for all living beings. Outside of my research, I enjoy exploring nature through trekking and biking, and I love catching ball games.

Read more

Kang Wang

Data Scientist

I am a Data Scientist in the IBM. I am also a PhD Candidate in the University of Waterloo.

Read more

Ashutosh Sagar

Data Scientist

I am currently a Data Scientist at IBM with a Master’s degree in Computer Science from Dalhousie University. I specialize in natural language processing, particularly in semantic similarity search, and have a strong background in working with advanced AI models and technologies.

Read more

Contributors

Wojciech "Victor" Fulmyk

Data Scientist at IBM

As a data scientist at the Ecosystems Skills Network at IBM and a Ph.D. candidate in Economics at the University of Calgary, I bring a wealth of experience in unraveling complex problems through the lens of data. What sets me apart is my ability to seamlessly merge technical expertise with effective communication, translating intricate data findings into actionable insights for stakeholders at all levels. Follow my projects to learn data science principles, machine learning algorithms, and artificial intelligence agent implementations.

Read more