Basics of Web Scraping with Python
The jupyter notebooks display simple methods to perform webscraping using the following Python libraries:
- Pandas
- Requests
- BeautifulSoap
Pandas makes it easy to scrape a table (
tag) on a web page. After obtaining it as a DataFrame, it is of course possible to do various processing and save it as an Excel file or csv file. Find more information here: https://pythonbasics.org/pandas-web-scraping/Requests allows you to send HTTP/1.1 requests extremely easily. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, thanks to urllib3. Find more information here: https://requests.readthedocs.io/en/latest/
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work. Find more information here: https://beautiful-soup-4.readthedocs.io/en/latest/