Scrape-app-api

Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Whether you are a data scientist, engineer, or anybody who analyzes large amounts of datasets, the ability to scrape data from the web is a useful skill to have. Let's say you find data from the web, and there is no direct way to download it, web scraping using Python is a skill you can use to extract the data into a useful form that can be imported.

Web Scraping using Beautiful Soup

  • import pandas as pd
  • import numpy as np
  • import matplotlib.pyplot as plt
  • import seaborn as sns
  • from urllib.request import urlopen
  • from bs4 import BeautifulSoup
  • url = "http://www.xyz.com"
  • html = urlopen(url)
  • soup = BeautifulSoup(html, 'lxml')
  • type(soup)
  • bs4.BeautifulSoup

The soup object allows you to extract interesting information about the website you're scraping such as getting the title of the page as shown below.

  
   # Get the title
title = soup.title
print(title)