Mastering Generative AI: Advanced Fine-Tuning for LLMs
Learn on
IntermediateCourse
Advance your skills in fine-tuning language models with our course on Generative AI. You will explore reinforcement learning techniques, such as Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO), to enhance model security. Learn how to effectively use Hugging Face for instruction tuning. This course is designed for intermediate learners eager to enhance their AI expertise securely.

Language
- English
Topic
- Artificial Intelligence
Enrollment Count
- 651
Skills You Will Learn
- Direct Preference Optimization, Hugging Face, Instruction Tuning, Proximal Policy Optimization, Reinforcement Learning
Offered By
- IBMSkillsNetwork
Estimated Effort
- 2 Weeks 5 hrs
Platform
- edX
Last Update
- March 17, 2026
About this Course
- Welcome to the Mastering Generative AI: Advanced Fine-Tuning for LLMs course!
- This course will take you through the advanced techniques for fine-tuning generative large language models (LLMs). Throughout this journey, you will explore instruction-tuning with Hugging Face, delve into reward modeling, and gain hands-on experience in training a reward model. Moreover, you will learn about proximal policy optimization (PPO) and its application using Hugging Face, understand LLMs as policies, and explore reinforcement learning from human feedback (RLHF). Finally, the course will guide you through direct performance optimization (DPO) using Hugging Face and the partition function.
- Text
- Edit
Actions
- Prerequisites
- To get the most out of this course, you should be comfortable with the following topics and technologies:
- A solid understanding of basic Generative AI concepts and models.
- Experience with Python programming, particularly in AI/ML contexts.
- Familiarity with Hugging Face and reinforcement learning concepts.
- Congratulations on taking this step to advance your skills in generative Al! Enjoy your learning journey!

Language
- English
Topic
- Artificial Intelligence
Enrollment Count
- 651
Skills You Will Learn
- Direct Preference Optimization, Hugging Face, Instruction Tuning, Proximal Policy Optimization, Reinforcement Learning
Offered By
- IBMSkillsNetwork
Estimated Effort
- 2 Weeks 5 hrs
Platform
- edX
Last Update
- March 17, 2026