Web-Scraping-Using-Python-BeautifulSoup

Scraping data from web using python library BeautifulSoup and requests.

BeautifulSoup

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Requests

Requests allows you to send organic, grass-fed HTTP/1.1 requests, without the need for manual labor. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic.

Documentation

Link-> https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Link-> http://docs.python-requests.org/en/master/

Workflow

Insect the page
Obtain HTML
Choose a parser (lxml , html5lib , html.parser)
Create a beautifulsoup object
Extract tags that we need
Store the data in lists
Make a dataframe
Download a CSV file that contains all data scraped Specification

Usage

Just run jupyter notebook in terminal and it will run in your browser.

Install Jupyter here i've you haven't.
install BautifulSoup by using pip install beautifulsoup4 in command line prompt/ anconda i've you haven't.

Packages used:

- from bs4 import BeautifulSoup
- import requests
- import pandas as pd

Contributing

Pull requests are always welcome. For major changes, please contact me on my LinkedIn account https://www.linkedin.com/in/rahulsisodia06/

rahkum96/Web-Scraping-Using-Python-BeautifulSoup