/Web-Scraping-Using-Python-BeautifulSoup

Scraping data from web using python library BeautifulSoup and requests

Primary LanguageJupyter Notebook

Web-Scraping-Using-Python-BeautifulSoup

Scraping data from web using python library BeautifulSoup and requests.

BeautifulSoup

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Requests

Requests allows you to send organic, grass-fed HTTP/1.1 requests, without the need for manual labor. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic.

Documentation

Link-> https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Link-> http://docs.python-requests.org/en/master/

Workflow

  1. Insect the page
  2. Obtain HTML
  3. Choose a parser (lxml , html5lib , html.parser)
  4. Create a beautifulsoup object
  5. Extract tags that we need
  6. Store the data in lists
  7. Make a dataframe
  8. Download a CSV file that contains all data scraped Specification

Usage

  • Just run jupyter notebook in terminal and it will run in your browser.

    Install Jupyter here i've you haven't.

  • install BautifulSoup by using pip install beautifulsoup4 in command line prompt/ anconda i've you haven't.

Packages used:

- from bs4 import BeautifulSoup
- import requests
- import pandas as pd

Contributing

Pull requests are always welcome. For major changes, please contact me on my LinkedIn account https://www.linkedin.com/in/rahulsisodia06/