Automate ML Pipelines Using Apache Airflow
By mastering Apache Airflow you will gain hands-on experience in building a KNN classification model for the Iris dataset, using Apache Airflow for workflow automation. You will also have learned how to deploy the trained model for prediction, and how to generate a DAG ( Directed Acyclic Graph ) for a data pipeline. It will increase productivity, reduce costs, and have faster time-to-insight. These skills are essential for any data scientist or engineer working on classification tasks and data pipelines and can be applied to a wide range of other datasets and workflows.
4.4 (34 Reviews)

Language
- English
Topic
- Data Science
Industries
- Information Technology
Enrollment Count
- 338
Skills You Will Learn
- Machine Learning, Python, Artificial Intelligence, Data Science
Offered By
- IBM
Estimated Effort
- 45 minutes
Platform
- SkillsNetwork
Last Update
- May 5, 2025
Why you should do this Guided Project
The Iris dataset is a well-known and widely-used dataset in the field of machine learning. It consists of measurements of three species of iris flowers and is commonly used as a benchmark dataset for classification models. In this project, you will gain hands-on experience in building a classification model using the K-Nearest Neighbors ( KNN ) algorithm, which is a popular machine-learning algorithm for classification tasks.
This project provides a structured approach to building a classification model that can be easily adapted to other datasets and workflows. The use of Apache Airflow allows for the automation of the entire process, from data preprocessing to model evaluation and deployment, making it easy to incorporate this workflow into your projects.
It is an opportunity to learn and practice using Apache Airflow, an open-source platform for programmatically creating, scheduling, and monitoring workflows. Airflow provides a user-friendly interface for building, testing, and deploying data pipelines, making it an essential tool for any data scientist or engineer.
A Look at the Project Ahead
- Understand the K-Nearest Neighbors (KNN) algorithm and its use in classification tasks.
- Implement an Apache Airflow workflow to automate the process of data preprocessing, model training, and evaluation.
- Using Airflow to schedule and monitor the execution of the workflow, and to visualise the results.
- Learn how to create a DAG ( Directed Acyclic Graph ) using Apache Airflow, which is a collection of tasks and dependencies that represent a data pipeline.
What You'll Need
This course mainly uses Python. Although these skills are recommended prerequisites, no prior experience is required as this Guided Project is designed for complete beginners.

Language
- English
Topic
- Data Science
Industries
- Information Technology
Enrollment Count
- 338
Skills You Will Learn
- Machine Learning, Python, Artificial Intelligence, Data Science
Offered By
- IBM
Estimated Effort
- 45 minutes
Platform
- SkillsNetwork
Last Update
- May 5, 2025
Instructors
Joseph Santarcangelo
Senior Data Scientist at IBM
Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.
Read moreJigisha Barbhaya
Data Scientist
I am a Data scientist at IBM and Lead instructor at Skills network. I love to learn and educate. I have completed my MSc(Computer Application) specialisation in Data science from Symbiosis University.
Read moreContributors
Efkan Serhat Goktepe
Developer | Architect
Efkan is a 4th year student in Computer Science at University of Toronto. Efkan is currently working at IBM as a Software Architect. Contact: efkan@ibm.com.
Read moreJ.C.(Junxing) Chen
Data scientist at IBM
Data science is easy and helpful! I want to let everyone know data science and help everyone using it for everyday life! Not only being a Data science guide person but also making friends, I want to make friends with peoples like you! As a data scienist, I hope my spread data science could help my friend!
Read moreSheng-Kai Chen
Data Scientist
Sheng-Kai Chen is a graduate student at the University of Toronto, concentrating on Information Systems & Design. Having several experiences analyzing data for retail stores and designing small software for small businesses. Sheng-Kai was inspired to shift toward answering new challenges with machine learning and new technics.
Read moreVicky Kuo
Data Scientist
I believe that success isn't just about individual milestones, but also about uplifting and encouraging others to reach their potential. This is why I'm passionate about combining my technical background with my eagerness to help people overcome technological hurdles and accelerate growth. When I’m not on the job, I love hiking with my two dogs or relaxing in a coffee shop. There's nothing better than having an insightful conversation over coffee, or even better, some volunteer work! Please feel free to reach out to me on LinkedIn.
Read more