Capstone Project: Data Engineering
This Capstone Project is designed for you to apply and showcase your skills and knowledge in Data Engineering, including SQL, NoSQL, RDBMS, Bash, Python, ETL, Data Warehousing, BI tools, and Big Data.

Language
- English
Topic
- Database
Skills You Will Learn
- Relational Database Management Systems (RDBMS), Extract Transform Load (ETL), BI Dashboards, NoSQL, Bash, Python
Offered By
- IBMSkillsNetwork
Estimated Effort
- 16 Hours
Platform
- SkillsNetwork
Last Update
- November 7, 2025
Upon successful completion of this Capstone, you will have the confidence and portfolio needed to tackle real-world data engineering projects and demonstrate your abilities as an entry-level data engineer.
What you will learn:
- Create and design a complete data and analytics platform.
- Setup, manage and query relational and NoSQL databases.
- Build data pipelines and ETL processes using Apache Airflow.
- Design and populate a star/snowflake schema data warehouse and query it using SQL.
- Analyze warehouse data using a Business Intelligence (BI) tool like Cognos Analytics to create reports and dashboards.
- Utilize Apache Spark to deploy a big data machine learning model.
Course Syllabus
- Assignment Overview
- OLTP Database
- OLTP Database Requirements and Design
- Hands-on Lab: OLTP Database
- Checklist: OLTP Database
- Graded Quiz: OLTP Database
Module 2 - Querying Data in NoSQL Databases
- Module Introduction and Learning Objectives
- Assignment Overview
- Querying data in NoSQL databases
- Hands-on Lab: Querying data in NoSQL databases
- Checklist: Querying data in NoSQL databases
- Graded Quiz: Querying data in NoSQL databases
Module 3 - Build a Data Warehouse
- Module Introduction and Learning Objectives
- Assignment Overview
- Hands-on Lab: Data Warehousing
- Checklist: Data Warehousing
- Hands-on Lab: Data Warehousing Reporting
- Checklist: Data Warehousing Reporting
- Graded Quiz: Data Warehousing & Reporting
Module 4 - Data Analytics
- Module Introduction and Learning Objectives
- Assignment Overview
- Hands-on Lab: Getting Started with Cognos Dashboard Embedded (Optional)
- Dashboard Creation
- Hands-on Lab: Dashboard Creation
- Checklist: Dashboard Creation
- Graded Quiz: Dashboard Creation
Module 5 - ETL & Data Pipelines
- Module Introduction and Learning Objectives
- Assignment Overview
- ETL & Data Pipelines using Apache Airflow
- Hands-on Lab: ETL
- Checklist: ETL
- Hands-on Lab: Data Pipelines using Apache Airflow
- Checklist: Data Pipelines using Apache Airflow
- Graded Quiz: ETL & Data Pipelines using Apache Airflow
Module 6 - Big Data Analytics with Spark
- Module Introduction and Learning Objectives
- Assignment Overview
- Big Data Analytics with Spark
- Practice Hands-on Lab: Saving and loading a SparkML model
- Hands-on Lab: SparkML Ops
- Checklist: Big Data Analytics with Spark
- Graded Quiz: Big Data Analytics with Spark
Module 7 - Final Submission
- Final Project
- Peer Review: Submit your Work and Review your Peers
- Wrap Up
- Reading: Congrats & Next Steps
- Reading: Team & Acknowledgements
General Information
- This course is self-paced.
- This platform works best with current versions of Chrome, Edge, Firefox, Internet Explorer, or Safari.
Recommended Skills Prior to Taking this Course
- Relational and NoSQL databases
- SQL, Python, and Shell scripting
- ETL with Airflow and Kafka
- Data Warehousing, Cubes, Rollups, and Materialized views/MQTs
- BI tools such as Cognos Analytics
- Big Data analytics with Apache Spark

Language
- English
Topic
- Database
Skills You Will Learn
- Relational Database Management Systems (RDBMS), Extract Transform Load (ETL), BI Dashboards, NoSQL, Bash, Python
Offered By
- IBMSkillsNetwork
Estimated Effort
- 16 Hours
Platform
- SkillsNetwork
Last Update
- November 7, 2025
Instructors
Rav Ahuja
Global Program Director, IBM Skills Network
Rav Ahuja is a Global Program Director at IBM. He leads growth strategy, curriculum creation, and partner programs for the IBM Skills Network. Rav co-founded Cognitive Class, an IBM led initiative to democratize skills for in demand technologies. He is based out of the IBM Canada Lab in Toronto and specializes in instructional solutions for AI, Data, Software Engineering and Cloud. Rav presents at events worldwide and has authored numerous papers, articles, books and courses on subjects in managing and analyzing data. Rav holds B. Eng. from McGill University and MBA from University of Western Ontario.
Read moreRamesh Sannareddy
Corporate IT Trainer
Ramesh Sannareddy holds a Bachelors Degree in Information Systems (Birla Institute of Technology, Pilani). He has two and a half decades of experience in Information Technology Infrastructure Management, Database Administration, Information Integration and Automation. He worked for companies like Intergraph, Genpact, HCL, and Microsoft. Currently, he is a freelancer and pursues his passion for teaching. He teaches Data Science, Machine Learning, Programming and Databases.
Read more