/WebScraping

Basics of Web Scraping with Python

Primary LanguageJupyter Notebook

WebScraping

Basics of Web Scraping with Python

The jupyter notebooks display simple methods to perform webscraping using the following Python libraries:

  • Pandas
  • Requests
  • BeautifulSoap

The Python library Pandas

Pandas makes it easy to scrape a table (

tag) on a web page. After obtaining it as a DataFrame, it is of course possible to do various processing and save it as an Excel file or csv file. Find more information here: https://pythonbasics.org/pandas-web-scraping/

The Python library Requests

Requests allows you to send HTTP/1.1 requests extremely easily. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, thanks to urllib3. Find more information here: https://requests.readthedocs.io/en/latest/

The Python library BeautifulSoap

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work. Find more information here: https://beautiful-soup-4.readthedocs.io/en/latest/

ENJOY YOUR WEB PLAYGROUND DAY!