Back to Catalog

Web Scraping for Python using Beautiful Soup

BeginnerGuided Project

Data is the fuel of Data Science. We can get data from databases and other data repositories. A lot of data is published as web pages. Web scraping is the process of harvesting data from web pages. BeautifulSoup is a Python library that allows for web scraping, parsing, and extracting data from HTML and XML documents. In this guided project, you will use BeautifulSoup to scrape the contents of a web page.

4.4 (155 Reviews)

Language

  • English

Topic

  • Python

Enrollment Count

  • 681

Skills You Will Learn

  • Python, BeautifulSoup

Offered By

  • IBMSkillsNetwork

Estimated Effort

  • 30 minutes

Platform

  • SkillsNetwork

Last Update

  • April 26, 2024
About This Guided Project
Web scraping with BeautifulSoup is a popular method for extracting data from websites and transforming the scraped data into a structured format for analysis and manipulation. BeautifulSoup provides a simple and efficient way to parse HTML and XML documents, an essential tool for web scraping projects. 

In this guided project, you will learn how to download and scrape the contents of a webpage, allowing you to extract and store specific information for further analysis. 

First, you’ll create a BeautifulSoup object and learn how to navigate its HTML structure using tags, children, parents, and siblings. Then, you’ll extract information, or elements, from HTML files by using filters, find_all, and find. Then, after you locate the specified elements, you will extract their text or attributes. Then, you’ll download and scrape the contents of a web page, including images and data from HTML tables, and convert the data into a Pandas DataFrame for further analysis.

Complete this guided project and gain the experience you need to begin   successfully scraping web pages using BeautifulSoup. 

A Look at the Project Ahead

After completing this project, you'll be able to:
  • Create a BeautifulSoup object
  • Extract information from HTML files
  • Download and scrape the contents of a web page

What You'll Need

For this project, you will need:
  • Familiarity with Python fundamentals
  • Familiarity with the basics of HTML
  • A web browser

Everything else is provided to you through the IBM Skills Network Labs environment, where you will have access to the Python service that we offer as part of the IBM Skills Network Labs environment. This platform works best with current versions of modern browsers.

IBM Skills Network Labs will provide you with everything you need to complete this project. However, if you are serious about Data Science, you should give IBM Watson® Studio a try. IBM Watson® Studio empowers data scientists, developers, and analysts to build, run and manage AI models, and optimize decisions anywhere on IBM Cloud Pak® for Data. Unite teams, automate AI lifecycles, and speed time to value on an open multi-cloud architecture. Get started with the IBM Watson Studio free of charge.

Instructors

Joseph Santarcangelo

Senior Data Scientist at IBM

Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Read more