Advanced Fine-Tuning for Large Language Models (LLMs)
Fine-tune Large Language Models (LLMs) to enhance AI accuracy and optimize performance with cutting-edge skills employers seek

Language
- English
Topic
- Artificial Intelligence
Skills You Will Learn
- Reinforcement Learning, Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), Hugging Face, Instruction-tuning
Offered By
- IBMSkillsNetwork
Estimated Effort
- 8 hours
Platform
- SkillsNetwork
Last Update
- July 21, 2025
- In-demand Generative AI engineering skills in fine-tuning LLMs employers are actively looking for in just 2 weeks
- Instruction-tuning and reward modeling with Hugging Face, plus LLMs as policies and RLHF
- Direct preference optimization (DPO) with partition function and Hugging Face and how to create an optimal solution to a DPO problem
- How to use proximal policy optimization (PPO) with Hugging Face to create a scoring function and perform dataset tokenization
Course Syllabus
Module 0: Welcome
- Video: Course Introduction
- Specialization Overview
- General Information
- Learning Objectives and Syllabus
- Helpful Tips for Course Completion
- Grading Scheme
Module 1: Different Approaches to Fine-Tuning
- Reading: Module Introduction and Learning Objectives
- Video: Basics of Instruction-Tuning
- Video: Instruction-Tuning with Hugging Face
- Reading: Instruction Tuning
- Lab: Instruction Fine-Tuning LLMs
- Video: Reward Modeling: Response Evaluation
- Video: Reward Model Training
- Video: Reward Modeling with Hugging Face
- Reading:Reward Modeling & Response Evaluation
- Lab: Reward Modeling
- Practice Quiz: Different Approaches to Fine-Tuning
- Reading: Module Summary and Highlights
- Graded Quiz: Different Approaches to Fine-Tuning
- Reading: Module Introduction and Learning Objectives
- Video: Large Language Models (LLMs) as Distributions
- Video: From Distributions to Policies
- Video: Reinforcement Learning from Human Feedback (RLHF)
- Video: Proximal Policy Optimization (PPO)
- Video: PPO with Hugging Face
- Video: PPO Trainer
- Lab: Reinforcement Learning from Human Feedback Using PPO
- Video: DPO – Partition Function
- Video: DPO – Optimal Solution
- Video: From Optimal Policy to DPO
- Video: DPO with Hugging Face
- Lab: Direct Preference Optimization (DPO) Using Hugging Face
- Reading: Fine-Tune LLMs Locally with InstructLab
- Reading: Module Summary and Highlights
- Practice Quiz: Fine-Tuning Causal LLMs with Human Feedback and Direct Preference
- Graded Quiz: Fine-Tuning Causal LLMs with Human Feedback and Direct Preference
- Reading: Cheat Sheet – Generative AI Advanced Fine-Tuning for LLMs
- Reading: Glossary – Generative AI Advance Fine-Tuning for LLMs
- Reading: Course Conclusion
- Reading: Congratulations and Next Steps
- Reading: Teams and Acknowledgments
- Copyright and Trademarks
Recommended Skills Prior to Taking this Course

Language
- English
Topic
- Artificial Intelligence
Skills You Will Learn
- Reinforcement Learning, Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), Hugging Face, Instruction-tuning
Offered By
- IBMSkillsNetwork
Estimated Effort
- 8 hours
Platform
- SkillsNetwork
Last Update
- July 21, 2025
Instructors
Joseph Santarcangelo
Senior Data Scientist at IBM
Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.
Read moreRav Ahuja
Global Program Director, IBM Skills Network
Rav Ahuja is a Global Program Director at IBM. He leads growth strategy, curriculum creation, and partner programs for the IBM Skills Network. Rav co-founded Cognitive Class, an IBM led initiative to democratize skills for in demand technologies. He is based out of the IBM Canada Lab in Toronto and specializes in instructional solutions for AI, Data, Software Engineering and Cloud. Rav presents at events worldwide and has authored numerous papers, articles, books and courses on subjects in managing and analyzing data. Rav holds B. Eng. from McGill University and MBA from University of Western Ontario.
Read moreWojciech "Victor" Fulmyk
Data Scientist at IBM
As a data scientist at the Ecosystems Skills Network at IBM and a Ph.D. candidate in Economics at the University of Calgary, I bring a wealth of experience in unraveling complex problems through the lens of data. What sets me apart is my ability to seamlessly merge technical expertise with effective communication, translating intricate data findings into actionable insights for stakeholders at all levels. Follow my projects to learn data science principles, machine learning algorithms, and artificial intelligence agent implementations.
Read moreIBM Skills Network
IBM Skills Network Team
At IBM Skills Network, we know how crucial it is for businesses, professionals, and students to build hands-on, job-ready skills quickly to stay competitive. Our courses are designed by experts who work at the forefront of technological innovation. With years of experience in fields like AI, software development, cybersecurity, data science, business management, and more, our instructors bring real-world insights and practical, hands-on learning to every module. Whether you're upskilling yourself or your team, we will equip you with the practical experience and future focused technical and business knowledge you need to succeed in today’s ever-evolving world.
Read more