Generative AI Advance Fine-Tuning for LLMs
Learn on
IntermediateCourse
Advance your skills in fine-tuning language models with our course on Generative AI. You will explore reinforcement learning techniques, such as Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO), to enhance model security. Learn how to effectively use Hugging Face for instruction tuning. This course is designed for intermediate learners eager to enhance their AI expertise securely.
4.3 (127 Reviews)

Language
- English
Topic
- Security
Industries
- Generative AI
Enrollment Count
- 21.10K
Skills You Will Learn
- Direct Preference Optimization (DPO), Hugging Face, Instruction Tuning, Proximal Policy Optimization (PPO), Reinforcement Learning
Offered By
- IBMSkillsNetwork
Estimated Effort
- 2 weeks/5hrs
Platform
- Coursera
Last Update
- March 17, 2026
About this Course
Welcome to the Generative AI Advanced Fine-Tuning for LLMs course. This course will take you through the advanced techniques for fine-tuning generative large language models (LLMs). Throughout this journey, you will explore instruction-tuning with Hugging Face, delve into reward modeling, and gain hands-on experience in training a reward model. Moreover, you will learn about proximal policy optimization (PPO) and its application using Hugging Face, understand LLMs as policies, and explore reinforcement learning from human feedback (RLHF). Finally, the course will guide you through direct performance optimization (DPO) using Hugging Face and the partition function.
Prerequisites
To get the most out of this course, you should be comfortable with the following topics and technologies:
- A solid understanding of basic Generative AI concepts and models.
- Experience with Python programming, particularly in AI/ML contexts.
- Familiarity with Hugging Face and reinforcement learning concepts.
If you need to learn more about these topics before taking this course, the following courses will offer the experience you need for success:
Course 1: Generative AI and LLMs: Architecture and Data Preparation
Course 2: Generative AI Foundational Models for NLP & Language Understanding
Course 3: Generative AI Language Modeling with Transformers
Course 4: Generative AI Engineering and Fine-Tuning Transformers
After completing this course, you will be able to:
- Understand and apply instruction-tuning techniques with Hugging Face.
- Develop and train reward models for LLMs.
- Implement proximal policy optimization (PPO) with Hugging Face.
- Leverage LLMs as policies in reinforcement learning contexts.
- Perform direct performance optimization (DPO) using Hugging Face.
Course outline
This course consists of two comprehensive modules:
Module 1: Different Approaches to Fine-Tuning
This module will begin by defining instruction-tuning and its process using Hugging Face. Further, you’ll delve into reward modeling, where you’ll preprocess the dataset and apply low-rank adaptation (LoRA) configuration. You’ll also learn to quantify quality responses, guide model optimization, and incorporate reward preferences. You’ll also describe reward trainer, an advanced training technique to train a model, and reward model loss using Hugging Face.
The hands-on labs in this module will allow practice on instruction-tuning and reward models.
Module 2: Fine-Tuning Causal LLMs with Human Feedback and Direct Preference
In this module, you will learn about in-context learning and advanced methods of prompt engineering to design and refine the prompts for generating relevant and accurate responses from AI. You’ll then be introduced to the LangChain framework, which is an open-source interface for simplifying the application development process using LLM. You’ll learn about its tools, components, and chat models. The module also includes concepts such as prompt templates, example selectors, and output parsers. You’ll then explore the LangChain document loader and retriever, LangChain chains and agents for building applications.
In hands-on labs, you will enhance LLM applications and develop an agent that uses integrated LLM, LangChain, and RAG technologies for interactive and efficient document retrieval.
Tools/Software
In this course, you will use free versions or trials of several tools including:
- Hugging Face
- Python
- Reinforcement Learning Libraries
- Reward modeling
Tips for success
- Stay organized and keep track of deadlines.
- Regularly practice using the tools and technologies discussed in the course.
- Engage with the community and ask questions if you need help.
- Experiment with the examples provided to deepen your understanding.
Disclaimer: Training or even fine-tuning a model from scratch is highly challenging because it requires vast amounts of data, significant GPU resources, and can be extremely expensive in terms of both time and computational costs.
Congratulations on taking this step to advance your skills in generative AI! Enjoy your learning journey!

Language
- English
Topic
- Security
Industries
- Generative AI
Enrollment Count
- 21.10K
Skills You Will Learn
- Direct Preference Optimization (DPO), Hugging Face, Instruction Tuning, Proximal Policy Optimization (PPO), Reinforcement Learning
Offered By
- IBMSkillsNetwork
Estimated Effort
- 2 weeks/5hrs
Platform
- Coursera
Last Update
- March 17, 2026