Back to Catalog

Building a Machine Learning Pipeline For NLP

BeginnerGuided Project

Natural language processing (NLP) is a part of artificial intelligence concerned with understanding written text. Sentiment analysis is an important part of NLP that identifies the emotional tone behind a body of text and is used in customer reviews and survey responses, online and social media. In this project, you will determine the sentiment of movie reviews as positive, negative, and neutral with the rule-based method, then use Machine Learning. You will use pandas to load and analyze data and sklearn to process and classify the text and work with other libraries like NLTK.

4.3 (41 Reviews)

Language

  • English

Topic

  • Artificial Intelligence

Industries

  • Financial Services, Healthcare, Government

Enrollment Count

  • 400

Skills You Will Learn

  • Natural Language Processing, Machine Learning, Python, NLP

Offered By

  • IBM

Estimated Effort

  • 2 hours

Platform

  • SkillsNetwork

Last Update

  • May 5, 2024
About This Guided Project

Why you should do this Guided Project

Sentiment analysis is an important part of NLP  that identifies the emotional tone behind a body of text and is used in customer reviews and survey responses, online and social media. In this project, you will determine the sentiment of movie reviews as positive, negative, and neutral. We start off with a rule-based method, then use Machine Learning, explaining the connection between the two. You will use Pandas to load, analyze and process your data.  Then use sklearn to transform your data with Bag-Of-Words, or Term Frequency–Inverse Document Frequency transforms, then find the  Sentiment using   Machine Learning. Streamline the process apply Machine learning pipelines, and perform  Hyperparameter selections in one shot. Finally, use libraries like the Natural language tool kit to improve performance. Each section will have toy examples so you can better wrap your head around it.

A Look at the Project Ahead

  • Understand Sentiment analysis
  • Apply pandas to load,analyze and process your data 
  • Understand text preprocessing 
  • Understand the connection between rule-based methods and  Machine Learning based methods 
  • Understand and Apply Bag-Of-Words and Term Frequency–Inverse Document Frequency to Sentiment analysis using
  • Apply Hyperparameter  using scikit-learn to NLP 
  • Apply Machine Learning pipeline using scikit-learn to NLP 

What You'll Need

You will need to know how to program in  Python and be somewhat familiar with Pandas and sklearn and logistic regression.  

Instructors

Joseph Santarcangelo

Senior Data Scientist at IBM

Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Read more

Contributors

Monireh Ebrahimi

Sr. Cognitive Software Developer

Monireh Ebrahimi is a Senior Cognitive Software Developer at IBM’s Center for Open-Source Data and AI Technologies (CODAIT) in San Francisco where she works on Open Source, Data & AI Technologies and she has been awarded “Outstanding Technical Award” in 2021 for her contributions to Text Extensions for Pandas. She has obtained her Ph.D. from Data Semantics (DaSe) lab at Kansas State University with a major focus on Neuro-Symbolic Integration. Her primary research interests include Deep Learning, Knowledge Graphs, Reasoning, Semantic Web, and Natural Language Processing. She is also really interested in applying NLP and Data Science in real world applications and get the chance to work with customers and partners in various industries and help them in their Data Science journey. Monireh has over 15 peer-reviewed publications and three patents and served as a PC member or reviewer for Artificial Intelligence (NeurIPS, AAAI, IJCAI, ICML, ICLR, JAIR), and Semantic Web conferences (ISWC, ESWC, TheWebCon) and received Most Outstanding Reviewer Award from WWW 2017.

Read more