Back to Catalog

Capstone Project: Data Engineering

Premium
AdvancedCourse

This Capstone Project is designed for you to apply and showcase your skills and knowledge in Data Engineering, including SQL, NoSQL, RDBMS, Bash, Python, ETL, Data Warehousing, BI tools, and Big Data.

Language

  • English

Topic

  • Database

Skills You Will Learn

  • Relational Database Management Systems (RDBMS), Extract Transform Load (ETL), BI Dashboards, NoSQL, Bash, Python

Offered By

  • IBMSkillsNetwork

Estimated Effort

  • 16 Hours

Platform

  • SkillsNetwork

Last Update

  • April 14, 2025
About this Course
In this Capstone Project, you will demonstrate your capabilities as a Data Engineer by designing, implementing, and managing a comprehensive data and analytics platform. This platform will include relational and non-relational databases, data warehouses, data pipelines, big data processing engines, and Business Intelligence (BI) tools. 

You will apply and refine the skills and knowledge acquired throughout the IBM Data Engineering Professional Certificate, utilizing various tools and technologies to design databases, collect data from multiple sources, implement ETL pipelines, and create analytical reports and BI dashboards. Additionally, you will implement predictive analytics and machine learning models using big data tools and techniques. 

This Capstone Project involves significant hands-on lab work, where you will showcase your proficiency with Python, Bash scripts, SQL, NoSQL, RDBMSs, data pipelines, MySQL, PostgreSQL, Db2, MongoDB, Apache Airflow, Apache Kafka, Apache Spark, and Cognos Analytics. 

Upon successful completion of this Capstone, you will have the confidence and portfolio needed to tackle real-world data engineering projects and demonstrate your abilities as an entry-level data engineer. 

What you will learn: 

After completing this course, you will be able to: 
  • Create and design a complete data and analytics platform. 
  • Setup, manage and query relational and NoSQL databases. 
  • Build data pipelines and ETL processes using Apache Airflow. 
  • Design and populate a star/snowflake schema data warehouse and query it using SQL. 
  • Analyze warehouse data using a Business Intelligence (BI) tool like Cognos Analytics to create reports and dashboards. 
  • Utilize Apache Spark to deploy a big data machine learning model.

Course Syllabus

Module 1 - Data Platform Architecture and OLTP Database
  • Assignment Overview
  • OLTP Database
  • OLTP Database Requirements and Design
  • Hands-on Lab: OLTP Database
  • Checklist: OLTP Database
  • Graded Quiz: OLTP Database

Module 2 - Querying Data in NoSQL Databases
  • Module Introduction and Learning Objectives
  • Assignment Overview
  • Querying data in NoSQL databases
  • Hands-on Lab: Querying data in NoSQL databases
  • Checklist: Querying data in NoSQL databases
  • Graded Quiz: Querying data in NoSQL databases

Module 3 - Build a Data Warehouse
  • Module Introduction and Learning Objectives
  • Assignment Overview
  • Hands-on Lab: Data Warehousing
  • Checklist: Data Warehousing
  • Hands-on Lab: Data Warehousing Reporting
  • Checklist: Data Warehousing Reporting
  • Graded Quiz: Data Warehousing & Reporting

Module 4 - Data Analytics
  • Module Introduction and Learning Objectives
  • Assignment Overview
  • Hands-on Lab: Getting Started with Cognos Dashboard Embedded (Optional)
  • Dashboard Creation
  • Hands-on Lab: Dashboard Creation
  • Checklist: Dashboard Creation
  • Graded Quiz: Dashboard Creation

Module 5 - ETL & Data Pipelines
  • Module Introduction and Learning Objectives
  • Assignment Overview
  • ETL & Data Pipelines using Apache Airflow
  • Hands-on Lab: ETL
  • Checklist: ETL
  • Hands-on Lab: Data Pipelines using Apache Airflow
  • Checklist: Data Pipelines using Apache Airflow
  • Graded Quiz: ETL & Data Pipelines using Apache Airflow

Module 6 - Big Data Analytics with Spark
  • Module Introduction and Learning Objectives
  • Assignment Overview
  • Big Data Analytics with Spark
  • Practice Hands-on Lab: Saving and loading a SparkML model
  • Hands-on Lab: SparkML Ops
  • Checklist: Big Data Analytics with Spark
  • Graded Quiz: Big Data Analytics with Spark

Module 7 - Final Submission
  • Final Project
  • Peer Review: Submit your Work and Review your Peers
  • Wrap Up
  • Reading: Congrats & Next Steps
  • Reading: Team & Acknowledgements

General Information

  • This course is self-paced. 
  • This platform works best with current versions of Chrome, Edge, Firefox, Internet Explorer, or Safari. 

Recommended Skills Prior to Taking this Course

In order to be successful in this course you should have a working knowledge of: 
  • Relational and NoSQL databases 
  • SQL, Python, and Shell scripting 
  • ETL with Airflow and Kafka 
  • Data Warehousing, Cubes, Rollups, and Materialized views/MQTs 
  • BI tools such as Cognos Analytics 
  • Big Data analytics with Apache Spark 
It is highly recommended that you complete all the preceding courses in the IBM Data Engineering Professional Certificate.