Great Expectations, a data validation library for Python
Garbage in, garbage out but sometimes gold could be wrongly put in the garbage. When data scientists are doing projects, the dataset-machine learning model pipeline requires appropriate data formats. But how could we check that our datasets are in good shape before our modelling? Great Expectations is the perfect tool for it. In this project, you could learn how to do data exploration using Great Expectations. Great Expectations is a Python-based open-source library for validating, documenting, and profiling your data. It helps you to maintain data quality and improve models.
4.6 (24 Reviews)
Language
- English
Topic
- Data Science
Enrollment Count
- 161
Skills You Will Learn
- Artificial Intelligence, Python, Data Governance, Machine Learning, Data Science
Offered By
- IBM
Platform
- SkillsNetwork
Last Update
- December 7, 2024
- Test data they ingest from other teams or vendors and ensure its validity.
- Validate data they transform as a step in their data pipeline in order to ensure the correctness of transformations.
- Prevent data quality issues from slipping into data products.
- Streamline knowledge capture from subject-matter experts and make implicit knowledge explicit.
- Develop rich, shared documentation of their data.
Here, in this project, we are going to show the basic usage of Great Expectations and apply it to an example bank churn data, modified from the one provided by Kaggle.
A Look at the Project Ahead
- What to check for the dataset before conducting a machine learning model
- Basic of Great Expectations
- How to use Great Expectations to create pipelines to make use of the dataset is good for the model
What You'll Need
Remember that the IBM Skills Network Labs environment comes with many things pre-installed (e.g. Docker) to save you the hassle of setting everything up.
Data Preparation at Scale with the IBM Data Refinery
Language
- English
Topic
- Data Science
Enrollment Count
- 161
Skills You Will Learn
- Artificial Intelligence, Python, Data Governance, Machine Learning, Data Science
Offered By
- IBM
Platform
- SkillsNetwork
Last Update
- December 7, 2024
Instructors
J.C.(Junxing) Chen
Data scientist at IBM
Data science is easy and helpful! I want to let everyone know data science and help everyone using it for everyday life! Not only being a Data science guide person but also making friends, I want to make friends with peoples like you! As a data scienist, I hope my spread data science could help my friend!
Read moreJoseph Santarcangelo
Senior Data Scientist at IBM
Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.
Read moreRoxanne Li
Data Scientist at IBM
I am an aspiring Data Scientist at IBM with extensive theoretical/academic, research, and work experience in different areas of Machine Learning, including Classification, Clustering, Computer Vision, NLP, and Generative AI. I've exploited Machine Learning to build data products for the P&C insurance industry in the past. I also recently became an instructor of the Unsupervised Machine Learning course by IBM on Coursera!
Read more