Use DeepEval and Traditional Metrics to assess RAG responses
Explore Large Language Model (LLM) evaluation techniques in this hands-on project that compares LLaMA and Granite for Retrieval-Augmented Generation (RAG) and textual analysis. Leverage HuggingFace's Evaluate library for computing traditional metrics and DeepEval, a modern, LLM-based framework for evaluating complex metrics. Through step-by-step guidance, you’ll set up RAG and metric evaluation pipelines, interpret the results, and discover how modular metrics adapt to any LLM use case. Enroll now to gain essential data science expertise and confidently deploy robust RAG applications.

Language
- English
Topic
- Artificial Intelligence
Skills You Will Learn
- LLM Evaluation, DeepEval, RAG, ROUGE, BERTScore, Generative AI
Offered By
- IBMSkillsNetwork
Estimated Effort
- 20 mins
Platform
- SkillsNetwork
Last Update
- August 29, 2025
A Look at the Project Ahead
- Set Up a RAG Pipeline: Integrate LLaMA and Granite with vector stores to retrieve relevant context for narrative QA.
- Compute and Compare Metrics: Apply ROUGE and BERTScore to quantify model and retrieval quality, then interpret results.
- Implement Evaluation Workflows: Use DeepEval to orchestrate human-like judgments alongside automatic metrics.
- Explore Modularity: See how easily you can swap in new models, datasets, or metrics for future experiments.
- Visualize and Interpret Results: Plot computed scores in comprehensive graphs to compare model performance on different metrics.
- Design and deploy a retrieval-augmented generation pipeline using popular open-source LLMs.
- Build a flexible evaluation framework that combines automatic scoring with LLM-driven judgment, and analyze metric outputs to guide model selection.
What You'll Need
- Basic Python proficiency: Comfortable with common data structures and writing simple scripts.
- Modern web browser: Latest version of Chrome, Edge, Firefox, or Safari for the optimal notebook experience.
- (Optional) Library knowledge: Minimal knowledge of Pandas DataFrame data structure and Matplotlib visualization.

Language
- English
Topic
- Artificial Intelligence
Skills You Will Learn
- LLM Evaluation, DeepEval, RAG, ROUGE, BERTScore, Generative AI
Offered By
- IBMSkillsNetwork
Estimated Effort
- 20 mins
Platform
- SkillsNetwork
Last Update
- August 29, 2025
Instructors
Contributors
Joseph Santarcangelo
Senior Data Scientist at IBM
Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.
Read moreWojciech "Victor" Fulmyk
Data Scientist at IBM
As a data scientist at the Ecosystems Skills Network at IBM and a Ph.D. candidate in Economics at the University of Calgary, I bring a wealth of experience in unraveling complex problems through the lens of data. What sets me apart is my ability to seamlessly merge technical expertise with effective communication, translating intricate data findings into actionable insights for stakeholders at all levels. Follow my projects to learn data science principles, machine learning algorithms, and artificial intelligence agent implementations.
Read more