Machine Learning with Apache Spark
This course introduces the fundamentals of Machine Learning (ML) with Apache Spark, covering Spark Structured Streaming, ETL for ML Pipelines, and Spark ML. By the end of the course, you’ll gain hands-on experience applying Spark skills to ETL and ML workflows.

Language
- English
Topic
- Big Data
Skills You Will Learn
- Unsupervised Learning, Machine Learning, Graph Theory, Data Engineering, Apache Spark, Batch Processing
Offered By
- IBMSkillsNetwork
Estimated Effort
- 15 Hours
Platform
- SkillsNetwork
Last Update
- October 3, 2025
In this course, you will learn the fundamentals of Machine Learning (ML) and Generative AI (GenAI). You will cover the ML model lifecycle, and explore supervised and unsupervised learning. You will practice working with ML models for classification, regression, and clustering.
You will explore concepts and gain hands-on skills for using Spark in data engineering and machine learning applications. You’ll learn about Spark Structured Streaming, including data sources, output modes, and operations. Additionally, you will explore Graph theory and see how GraphFrames enhance Spark DataFrames and popular algorithms.
Learn how to use Spark for extract, transform, and load (ETL) processes, and practice your skills in the "ETL for Machine Learning Pipelines" lab.
What you will learn:
- Describe the fundamentals of Machine Learning and Generative AI
- Differentiate between supervised and unsupervised Machine learning
- Implement ML algorithms for classification, regression and clustering using Python
- Explain the features, benefits, limitations, and application of Apache Spark Structured Streaming
- Define Graph theory and explain how GraphFrames benefits developers
- Describe how developers can apply extract, transform and load (ETL) processes using Spark.
- Explain how Spark ML supports machine learning development
- Apply Spark ML for regression and classification
- Explain how Spark ML uses clustering
- Demonstrate hands-on working knowledge of using Spark for ETL processes
Course Syllabus
- Spark Structured Streaming
- GraphFrames on Apache Spark
- ETL Workloads ETL for ML Pipelines
- Spark ML Fundamentals
- Spark ML Regression and Classification
- Spark ML Clustering
- Setup & Practice Assignment
- Project Overview
- Final Assignment Project
- Final Quiz
General Information
- This course is self-paced.
- This platform works best with current versions of Chrome, Edge, Firefox, Internet Explorer, or Safari.
Recommended Skills Prior to Taking this Course
- Foundational Spark knowledge and skills, such as those gained from the IBM course titled "Spark and Hadoop Fundamentals for Big Data Analytics."
- Working knowledge of the Python programming language.

Language
- English
Topic
- Big Data
Skills You Will Learn
- Unsupervised Learning, Machine Learning, Graph Theory, Data Engineering, Apache Spark, Batch Processing
Offered By
- IBMSkillsNetwork
Estimated Effort
- 15 Hours
Platform
- SkillsNetwork
Last Update
- October 3, 2025
Instructors
Ramesh Sannareddy
Corporate IT Trainer
Ramesh Sannareddy holds a Bachelors Degree in Information Systems (Birla Institute of Technology, Pilani). He has two and a half decades of experience in Information Technology Infrastructure Management, Database Administration, Information Integration and Automation. He worked for companies like Intergraph, Genpact, HCL, and Microsoft. Currently, he is a freelancer and pursues his passion for teaching. He teaches Data Science, Machine Learning, Programming and Databases.
Read more
Karthik Muthuraman
Read more