ETL & Data Pipelines with Bash, Airflow and Kafka
Premium
IntermediatecourseThis course equips you with practical skills to build and manage data pipelines and ETL & ELT processes using shell scripts, Airflow, and Kafka. Practice course concepts using hands-on labs and projects.
Language
- English
Topic
- Database
Skills You Will Learn
- Apache Airflow, Apache Kafka, Bash, Data Pipeline, Extract Transform Load (ETL), Data Warehouse
Offered By
- IBMSkillsNetwork
Estimated Effort
- 17 Hours
Platform
- SkillsNetwork
Last Update
- October 25, 2024
About this course
Well-designed and automated data pipelines and ETL processes are the foundation of a successful Business Intelligence platform. Defining your data workflows, pipelines, and processes early in the platform design ensures the right raw data is collected, transformed, and loaded into desired storage layers, making it available for processing and analysis as needed.
This course is designed to provide you with the critical knowledge and skills needed by Data Engineers and Data Warehousing specialists to create and manage ETL, ELT, and data pipeline processes.
Upon completing this course, you’ll gain a solid understanding of:
- Extract, Transform, Load (ETL), and Extract, Load, Transform (ELT) processes
- Extracting data, transforming it, and loading transformed data into a staging area
- Creating an ETL data pipeline using Bash shell scripting
- Building a batch workflow using Apache Airflow
- Constructing a streaming data pipeline using Apache Kafka.
You’ll gain hands-on experience through practice labs throughout the course and work on a real-world inspired project to build data pipelines using various technologies. This project can be added to your portfolio to demonstrate your ability to perform as a Data Engineer.
What you will learn:
After completing this course, you will be able to:
- Describe and differentiate between Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) processes
- Define data pipeline components, processes, tools and technologies
- Create ETL processes using Bash shell scripts
- Develop batch data pipelines using Apache Airflow
- Create streaming data pipelines using Apache Kafka
Course Syllabus
Module 1 - Data Processing Techniques
- Lesson 1 - ETL and ELT Processes
- Module Introduction & Learning Objectives
- ETL Fundamentals
- ELT Basics
- Comparing ETL to ELT
- Data Extraction Techniques
- Introduction to Data Transformation Techniques
- Data Loading Techniques
- Interactivity: Tell the difference between ETL and ELT
- Summary & Highlights
- Practice Quiz: ETL and ELT Processes
- Graded Quiz: ETL and ELT Processes
Module 2 - ETL & Data Pipelines: Tools and Techniques
- Lesson 1 - ETL using Shell Scripts
- Module Introduction & Learning Objectives
- Introduction to the lesson for those not familiar with Linux commands and Shell Scripting
- ETL Techniques
- ETL using Shell Scripting
- Hands-on Lab: ETL using Shell Scripts
- Summary & Highlights
- Practice Quiz: ETL using Shell Scripts
- Graded Quiz: ETL using Shell Scripts
- Lesson 2 - An Introduction to Data Pipelines
- An Introduction to Data Pipelines
- Introduction to Data Pipelines
- Key Data Pipeline Processes
- Batch Versus Streaming Data Pipeline Use Cases
- Data Pipeline Tools and Technologies
- Interactivity: Differentiate between Batch Processing and Stream Processing
- Summary & Highlights
- Practice Quiz: An Introduction to Data Pipelines
- Graded Quiz: An Introduction to Data Pipelines
Module 3 - Building Data Pipelines using Airflow
- Lesson 1 - Using Apache Airflow to Build Data Pipelines
- Using Apache Airflow to build Data Pipelines
- Apache Airflow Overview
- Advantages of Using Data Pipelines as DAGs in Apache Airflow
- Apache Airflow UI
- Hands-on Lab: Getting Started with Apache Airflow
- Build DAG Using Airflow
- Hands-on Lab: Create a DAG for Apache Airflow
- Airflow Monitoring and Logging
- Hands-on Lab: Monitoring a DAG
- Summary & Highlights
- Practice Quiz: Using Apache Airflow to build Data Pipelines
- Graded Quiz: Using Apache Airflow to build Data Pipelines
Module 4 - Building Streaming Pipelines using Kafka
- Lesson 1 - Using Apache Kafka to Build Pipelines for Streaming Data
- Using Apache Kafka to build Pipelines for Streaming Data
- Distributed Event Streaming Platform Components
- Apache Kafka Overview
- Building Event Streaming Pipelines using Kafka
- Kafka Streaming Process
- Hands-on Lab: Working with Streaming Data using Kafka
- (Optional) Reading/Hands-on Lab: Message Keys and offset
- Kafka Python Client
- Summary & Highlights
- Practice Quiz: Using Apache Kafka to build Pipelines for Streaming Data
- Graded Quiz: Using Apache Kafka to build Pipelines for Streaming Data
Final Assignment
- Project Overview
- Hands-on Lab: Build an ETL Pipeline using Airflow
- Hands-on Lab: Build a streaming ETL Pipeline using Kafka
- Peer Review: Project Submission and Peer Review
Final Quiz
- Graded Quiz: Final Quiz
Course Wrap-up
- Reading: Team & Acknowledgements
- Reading: Congrats & Next Steps
General Information
- This course is self-paced.
- This platform works best with current versions of Chrome, Edge, Firefox, Internet Explorer, or Safari.
Recommended Skills Prior to Taking this Course
- This course pre-requisites that you have prior skills to work with datasets, SQL, relational databases, and Bash shell scripts.
Language
- English
Topic
- Database
Skills You Will Learn
- Apache Airflow, Apache Kafka, Bash, Data Pipeline, Extract Transform Load (ETL), Data Warehouse
Offered By
- IBMSkillsNetwork
Estimated Effort
- 17 Hours
Platform
- SkillsNetwork
Last Update
- October 25, 2024