Back to Catalog

Automate ML Pipelines Using Apache Airflow

IntermediateGuided Project

By mastering Apache Airflow you will gain hands-on experience in building a KNN classification model for the Iris dataset, using Apache Airflow for workflow automation. You will also have learned how to deploy the trained model for prediction, and how to generate a DAG ( Directed Acyclic Graph ) for a data pipeline. It will increase productivity, reduce costs, and have faster time-to-insight. These skills are essential for any data scientist or engineer working on classification tasks and data pipelines and can be applied to a wide range of other datasets and workflows.

4.4 (34 Reviews)

Language

  • English

Topic

  • Data Science

Industries

  • Information Technology

Enrollment Count

  • 338

Skills You Will Learn

  • Machine Learning, Python, Artificial Intelligence, Data Science

Offered By

  • IBM

Estimated Effort

  • 45 minutes

Platform

  • SkillsNetwork

Last Update

  • May 5, 2025
About this Guided Project

Why you should do this Guided Project

This guided project combines two powerful technologies: Apache Airflow and machine learning classification algorithms. In this project, you will learn how to use Airflow to create a workflow that trains and tests a classification model on a dataset. You will also explore different classification algorithms such as KNN ( K-Nearest Neighbors ) which algorithm is best suited for the dataset. By the end of the project, you will have a complete workflow for training and testing a classification model, making it easy to deploy the model to production and generate its DAG (Directed Acyclic Graph)

The Iris dataset is a well-known and widely-used dataset in the field of machine learning. It consists of measurements of three species of iris flowers and is commonly used as a benchmark dataset for classification models. In this project, you will gain hands-on experience in building a classification model using the K-Nearest Neighbors ( KNN ) algorithm, which is a popular machine-learning algorithm for classification tasks.
This project provides a structured approach to building a classification model that can be easily adapted to other datasets and workflows. The use of Apache Airflow allows for the automation of the entire process, from data preprocessing to model evaluation and deployment, making it easy to incorporate this workflow into your projects.

It is an opportunity to learn and practice using Apache Airflow, an open-source platform for programmatically creating, scheduling, and monitoring workflows. Airflow provides a user-friendly interface for building, testing, and deploying data pipelines, making it an essential tool for any data scientist or engineer.


A Look at the Project Ahead

After completing this guided project you will be able to :
  • Understand the K-Nearest Neighbors (KNN) algorithm and its use in classification tasks.
  • Implement an Apache Airflow workflow to automate the process of data preprocessing, model training, and evaluation.
  • Using Airflow to schedule and monitor the execution of the workflow, and to visualise the results.
  • Learn how to create a DAG ( Directed Acyclic Graph ) using Apache Airflow, which is a collection of tasks and dependencies that represent a data pipeline.

What You'll Need

To complete this guided project, you will need a basic understanding of Machine Learning. You will also need some prior experience working with Python to understand code easily.
This course mainly uses Python. Although these skills are recommended prerequisites, no prior experience is required as this Guided Project is designed for complete beginners.

Instructors

Joseph Santarcangelo

Senior Data Scientist at IBM

Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Read more

Jigisha Barbhaya

Data Scientist

I am a Data scientist at IBM and Lead instructor at Skills network. I love to learn and educate. I have completed my MSc(Computer Application) specialisation in Data science from Symbiosis University.

Read more

Contributors

Efkan Serhat Goktepe

Developer | Architect

Efkan is a 4th year student in Computer Science at University of Toronto. Efkan is currently working at IBM as a Software Architect. Contact: efkan@ibm.com.

Read more

J.C.(Junxing) Chen

Data scientist at IBM

Data science is easy and helpful! I want to let everyone know data science and help everyone using it for everyday life! Not only being a Data science guide person but also making friends, I want to make friends with peoples like you! As a data scienist, I hope my spread data science could help my friend!

Read more

Sheng-Kai Chen

Data Scientist

Sheng-Kai Chen is a graduate student at the University of Toronto, concentrating on Information Systems & Design. Having several experiences analyzing data for retail stores and designing small software for small businesses. Sheng-Kai was inspired to shift toward answering new challenges with machine learning and new technics.

Read more

Vicky Kuo

Data Scientist

I believe that success isn't just about individual milestones, but also about uplifting and encouraging others to reach their potential. This is why I'm passionate about combining my technical background with my eagerness to help people overcome technological hurdles and accelerate growth. When I’m not on the job, I love hiking with my two dogs or relaxing in a coffee shop. There's nothing better than having an insightful conversation over coffee, or even better, some volunteer work! Please feel free to reach out to me on LinkedIn.

Read more