Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Whether you are a data scientist, engineer, or anybody who analyzes large amounts of datasets, the ability to scrape data from the web is a useful skill to have. Let's say you find data from the web, and there is no direct way to download it, web scraping using Python is a skill you can use to extract the data into a useful form that can be imported.
- import pandas as pd
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- from urllib.request import urlopen
- from bs4 import BeautifulSoup
- url = "http://www.xyz.com"
- html = urlopen(url)
- soup = BeautifulSoup(html, 'lxml')
- type(soup)
- bs4.BeautifulSoup
The soup object allows you to extract interesting information about the website you're scraping such as getting the title of the page as shown below.
# Get the title
title = soup.title
print(title)