Back to Catalog

Capstone Project: Data Engineering

Premium
AdvancedCourse

This Capstone Project is designed for you to apply and showcase your skills and knowledge in Data Engineering, including SQL, NoSQL, RDBMS, Bash, Python, ETL, Data Warehousing, BI tools, and Big Data.

Language

  • English

Topic

  • Database

Skills You Will Learn

  • Relational Database Management Systems (RDBMS), Extract Transform Load (ETL), BI Dashboards, NoSQL, Bash, Python

Offered By

  • IBMSkillsNetwork

Estimated Effort

  • 16 Hours

Platform

  • SkillsNetwork

Last Update

  • November 7, 2025
About this Course
In this Capstone Project, you will demonstrate your capabilities as a Data Engineer by designing, implementing, and managing a comprehensive data and analytics platform. This platform will include relational and non-relational databases, data warehouses, data pipelines, big data processing engines, and Business Intelligence (BI) tools. 

You will apply and refine the skills and knowledge acquired throughout the IBM Data Engineering Professional Certificate, utilizing various tools and technologies to design databases, collect data from multiple sources, implement ETL pipelines, and create analytical reports and BI dashboards. Additionally, you will implement predictive analytics and machine learning models using big data tools and techniques. 

This Capstone Project involves significant hands-on lab work, where you will showcase your proficiency with Python, Bash scripts, SQL, NoSQL, RDBMSs, data pipelines, MySQL, PostgreSQL, Db2, MongoDB, Apache Airflow, Apache Kafka, Apache Spark, and Cognos Analytics. 

Upon successful completion of this Capstone, you will have the confidence and portfolio needed to tackle real-world data engineering projects and demonstrate your abilities as an entry-level data engineer. 

What you will learn: 

After completing this course, you will be able to: 
  • Create and design a complete data and analytics platform. 
  • Setup, manage and query relational and NoSQL databases. 
  • Build data pipelines and ETL processes using Apache Airflow. 
  • Design and populate a star/snowflake schema data warehouse and query it using SQL. 
  • Analyze warehouse data using a Business Intelligence (BI) tool like Cognos Analytics to create reports and dashboards. 
  • Utilize Apache Spark to deploy a big data machine learning model.

Course Syllabus

Module 1 - Data Platform Architecture and OLTP Database
  • Assignment Overview
  • OLTP Database
  • OLTP Database Requirements and Design
  • Hands-on Lab: OLTP Database
  • Checklist: OLTP Database
  • Graded Quiz: OLTP Database

Module 2 - Querying Data in NoSQL Databases
  • Module Introduction and Learning Objectives
  • Assignment Overview
  • Querying data in NoSQL databases
  • Hands-on Lab: Querying data in NoSQL databases
  • Checklist: Querying data in NoSQL databases
  • Graded Quiz: Querying data in NoSQL databases

Module 3 - Build a Data Warehouse
  • Module Introduction and Learning Objectives
  • Assignment Overview
  • Hands-on Lab: Data Warehousing
  • Checklist: Data Warehousing
  • Hands-on Lab: Data Warehousing Reporting
  • Checklist: Data Warehousing Reporting
  • Graded Quiz: Data Warehousing & Reporting

Module 4 - Data Analytics
  • Module Introduction and Learning Objectives
  • Assignment Overview
  • Hands-on Lab: Getting Started with Cognos Dashboard Embedded (Optional)
  • Dashboard Creation
  • Hands-on Lab: Dashboard Creation
  • Checklist: Dashboard Creation
  • Graded Quiz: Dashboard Creation

Module 5 - ETL & Data Pipelines
  • Module Introduction and Learning Objectives
  • Assignment Overview
  • ETL & Data Pipelines using Apache Airflow
  • Hands-on Lab: ETL
  • Checklist: ETL
  • Hands-on Lab: Data Pipelines using Apache Airflow
  • Checklist: Data Pipelines using Apache Airflow
  • Graded Quiz: ETL & Data Pipelines using Apache Airflow

Module 6 - Big Data Analytics with Spark
  • Module Introduction and Learning Objectives
  • Assignment Overview
  • Big Data Analytics with Spark
  • Practice Hands-on Lab: Saving and loading a SparkML model
  • Hands-on Lab: SparkML Ops
  • Checklist: Big Data Analytics with Spark
  • Graded Quiz: Big Data Analytics with Spark

Module 7 - Final Submission
  • Final Project
  • Peer Review: Submit your Work and Review your Peers
  • Wrap Up
  • Reading: Congrats & Next Steps
  • Reading: Team & Acknowledgements

General Information

  • This course is self-paced. 
  • This platform works best with current versions of Chrome, Edge, Firefox, Internet Explorer, or Safari. 

Recommended Skills Prior to Taking this Course

In order to be successful in this course you should have a working knowledge of: 
  • Relational and NoSQL databases 
  • SQL, Python, and Shell scripting 
  • ETL with Airflow and Kafka 
  • Data Warehousing, Cubes, Rollups, and Materialized views/MQTs 
  • BI tools such as Cognos Analytics 
  • Big Data analytics with Apache Spark 
It is highly recommended that you complete all the preceding courses in the IBM Data Engineering Professional Certificate. 

Instructors

Rav Ahuja

Global Program Director, IBM Skills Network

Rav Ahuja is a Global Program Director at IBM. He leads growth strategy, curriculum creation, and partner programs for the IBM Skills Network. Rav co-founded Cognitive Class, an IBM led initiative to democratize skills for in demand technologies. He is based out of the IBM Canada Lab in Toronto and specializes in instructional solutions for AI, Data, Software Engineering and Cloud. Rav presents at events worldwide and has authored numerous papers, articles, books and courses on subjects in managing and analyzing data. Rav holds B. Eng. from McGill University and MBA from University of Western Ontario.

Read more

Ramesh Sannareddy

Corporate IT Trainer

Ramesh Sannareddy holds a Bachelors Degree in Information Systems (Birla Institute of Technology, Pilani). He has two and a half decades of experience in Information Technology Infrastructure Management, Database Administration, Information Integration and Automation. He worked for companies like Intergraph, Genpact, HCL, and Microsoft. Currently, he is a freelancer and pursues his passion for teaching. He teaches Data Science, Machine Learning, Programming and Databases.

Read more