From Chaos to Order: Automate Documents Categorization by AI
Construct a news classifier for a content search engine using TorchText, while gaining a deep understanding of NLP fundamentals, including embeddings and tokenization. The headlines will be categorized into World, Sports, Business, and Science/Tech, which can be adapted to your specific use case.
4.9 (14 Reviews)

Language
- English
Topic
- Artificial Intelligence
Enrollment Count
- 148
Skills You Will Learn
- Deep Learning, PyTorch, Machine Learning, Natural Language Processing, Python, LLM
Offered By
- IBM
Estimated Effort
- 1 hour
Platform
- SkillsNetwork
Last Update
- May 9, 2025
Natural Language Processing (NLP) plays a crucial role in understanding the intricate workings of Large Language Models (LLMs). In this project, we will thoroughly explore the fundamentals of NLP, covering everything from tokenization to embedding, to gain a deeper understanding of how these models decode and utilize language. By learning these fundamental concepts, you will gain a new perspective on the high-end capabilities of NLPs i.e. LLMs. These powerful models have the remarkable ability to make sense of words and sentences, comprehending the nuances of language comprehension. The project will follow a structured approach, starting with hands-on practice of the basics and gradually progressing to the implementation of your very own news classifier. Through this project, you will develop practical skills and insights into building text classification models for real-world applications.
A Look at the Project Ahead
- Work with datasets and understand tokenizer, embedding bag technique and vocabulary.
- Explore embeddings in PyTorch and understand token indices.
- Perform text classification using data loader and apply it on a neural network model.
- Train the text classification model on a news dataset.
What You'll Need

Language
- English
Topic
- Artificial Intelligence
Enrollment Count
- 148
Skills You Will Learn
- Deep Learning, PyTorch, Machine Learning, Natural Language Processing, Python, LLM
Offered By
- IBM
Estimated Effort
- 1 hour
Platform
- SkillsNetwork
Last Update
- May 9, 2025
Instructors
Joseph Santarcangelo
Senior Data Scientist at IBM
Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.
Read moreRoodra Kanwar
Data Scientist at IBM
I am a data scientist by day, superhero by night. Psych! I wish I was that cool. Only the former part is true which is still pretty cool! I believe in constant learning and it is an essential part of being a productive data enthusiast. I am also pursuing my masters in computer science from Simon Fraser University specializing in Big Data. Moreover, knowledge is transfer learning (pun intended!) and what I have gained, I plan on reflecting it back to the data community.
Read more