Generative AI Advance Fine-Tuning for LLMs

Learn on

IntermediateCourse

Advance your skills in fine-tuning language models with our course on Generative AI. You will explore reinforcement learning techniques, such as Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO), to enhance model security. Learn how to effectively use Hugging Face for instruction tuning. This course is designed for intermediate learners eager to enhance their AI expertise securely.

4.2 (89 Reviews)

Language

English

Topic

Security

Industries

Generative AI

Enrollment Count

10.09K

Skills You Will Learn

Reinforcement Learning, Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), Hugging Face, Instruction Tuning

Offered By

IBMSkillsNetwork

Estimated Effort

2 weeks/5hrs

Platform

Coursera

Last Update

August 10, 2025

About this Course

Welcome to the Generative AI Advanced Fine-Tuning for LLMs course. This course will take you through the advanced techniques for fine-tuning generative large language models (LLMs). Throughout this journey, you will explore instruction-tuning with Hugging Face, delve into reward modeling, and gain hands-on experience in training a reward model. Moreover, you will learn about proximal policy optimization (PPO) and its application using Hugging Face, understand LLMs as policies, and explore reinforcement learning from human feedback (RLHF). Finally, the course will guide you through direct performance optimization (DPO) using Hugging Face and the partition function.

Prerequisites

To get the most out of this course, you should be comfortable with the following topics and technologies:

A solid understanding of basic Generative AI concepts and models.
Experience with Python programming, particularly in AI/ML contexts.
Familiarity with Hugging Face and reinforcement learning concepts.

If you need to learn more about these topics before taking this course, the following courses will offer the experience you need for success:

After completing this course, you will be able to:

Understand and apply instruction-tuning techniques with Hugging Face.
Develop and train reward models for LLMs.
Implement proximal policy optimization (PPO) with Hugging Face.
Leverage LLMs as policies in reinforcement learning contexts.
Perform direct performance optimization (DPO) using Hugging Face.

Course outline

This course consists of two comprehensive modules:

Module 1: Different Approaches to Fine-Tuning

This module will begin by defining instruction-tuning and its process using Hugging Face. Further, you’ll delve into reward modeling, where you’ll preprocess the dataset and apply low-rank adaptation (LoRA) configuration. You’ll also learn to quantify quality responses, guide model optimization, and incorporate reward preferences. You’ll also describe reward trainer, an advanced training technique to train a model, and reward model loss using Hugging Face.

The hands-on labs in this module will allow practice on instruction-tuning and reward models.

Module 2: Fine-Tuning Causal LLMs with Human Feedback and Direct Preference

In this module, you will learn about in-context learning and advanced methods of prompt engineering to design and refine the prompts for generating relevant and accurate responses from AI. You’ll then be introduced to the LangChain framework, which is an open-source interface for simplifying the application development process using LLM. You’ll learn about its tools, components, and chat models. The module also includes concepts such as prompt templates, example selectors, and output parsers. You’ll then explore the LangChain document loader and retriever, LangChain chains and agents for building applications.

In hands-on labs, you will enhance LLM applications and develop an agent that uses integrated LLM, LangChain, and RAG technologies for interactive and efficient document retrieval.

Tools/Software

In this course, you will use free versions or trials of several tools including:

Hugging Face
Python
Reinforcement Learning Libraries
Reward modeling

Tips for success

Stay organized and keep track of deadlines.
Regularly practice using the tools and technologies discussed in the course.
Engage with the community and ask questions if you need help.
Experiment with the examples provided to deepen your understanding.

Disclaimer: Training or even fine-tuning a model from scratch is highly challenging because it requires vast amounts of data, significant GPU resources, and can be extremely expensive in terms of both time and computational costs.

https://arxiv.org/pdf/1909.08593

https://arxiv.org/pdf/2310.00212

https://arxiv.org/pdf/2203.02155

https://arxiv.org/pdf/2305.18290

Congratulations on taking this step to advance your skills in generative AI! Enjoy your learning journey!

Language

English

Topic

Security

Industries

Generative AI

Enrollment Count

10.09K

Skills You Will Learn

Reinforcement Learning, Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), Hugging Face, Instruction Tuning

Offered By

IBMSkillsNetwork

Estimated Effort

2 weeks/5hrs

Platform

Coursera

Last Update

August 10, 2025

Instructors

Joseph Santarcangelo

Senior Data Scientist at IBM

Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Rav Ahuja

Global Program Director, IBM Skills Network

Rav Ahuja is a Global Program Director at IBM. He leads growth strategy, curriculum creation, and partner programs for the IBM Skills Network. Rav co-founded Cognitive Class, an IBM led initiative to democratize skills for in demand technologies. He is based out of the IBM Canada Lab in Toronto and specializes in instructional solutions for AI, Data, Software Engineering and Cloud. Rav presents at events worldwide and has authored numerous papers, articles, books and courses on subjects in managing and analyzing data. Rav holds B. Eng. from McGill University and MBA from University of Western Ontario.

Fateme Akbari

Data Scientist @IBM

I'm a data-driven Ph.D. Candidate at McMaster University and a data scientist at IBM, specializing in machine learning (ML) and natural language processing (NLP). My research focuses on the application of ML in healthcare, and I have a strong record of publications that reflect my commitment to advancing this field. I thrive on tackling complex challenges and developing innovative, ML-based solutions that can make a meaningful impact—not only for humans but for all living beings. Outside of my research, I enjoy exploring nature through trekking and biking, and I love catching ball games.