Beautiful Soup Web Scraper

Demonstration of using Beautiful Soup to scrape data from the web.

Use Beautiful Soup to parse a page (scraper.py, follow-scraper.py)

Use Scrapy to create a spider to follow links provided (horse.py)

Use Scrapy to create a spider to extract links from a site and follow them (crawler.py)

Find forms and submit data via Scrapy

Scrape data from World Bank API

Unit testing and testing with Selenium

DEPENDENCIES

Python 3x
PyCharm beautifulsoup4 package
Beautiful Soup 4 Docs
Scrapy
Selenium
Chrome Webdriver

KNOWN PROBLEMS WITH FIREFOX WEBDRIVER IN CATALINA

macOS 10.15 (Catalina):

Due to the recent requirement from Apple that all programs must be notarized, geckodriver will not work on Catalina if you manually download it through another notarized program, such as Firefox.

Best to use Chrome webdriver for now.

MORE INFO

Examples from Treehouse, Scraping Data From the Web
PEP 20, The Zen of Python
Python 3 Docs
Beautiful Soup
Beautiful Soup 4 Docs
Example Website to Scrape, Horses
Example Horse Website Form via formspree
Scrapy
formspree.io
Scrapy FormRequest
World Bank API Docs
World Bank API, Requesting Country Data
World Bank Data - Ethiopia
Selenium Webdriver Docs
How to Install Chrome Webdriver on Mac
Gecko Webdriver for Firefox

EdwardRutz/data-scraping

Beautiful Soup Web Scraper

DEPENDENCIES

KNOWN PROBLEMS WITH FIREFOX WEBDRIVER IN CATALINA

MORE INFO