/Web_Scraping_Fundamentals

This repository discover the fundamental ideas about web scraping and offer some practices to learn the tools.

Primary LanguageJupyter NotebookMIT LicenseMIT

Web Scraping Fundamentals

This repository introduces the fundamental cencepts of web scraping and practices using different web-scraping packages in Python. It also includes a brief introduction of API request for data collection.

Section 1: Web Scraping Basics

  1. Request and Parsing
  2. Explore HTML Structure
  3. Isolate Data
  4. Preparing for Paginated Scraping
  5. Scraping Paginated Content

web_scraping

web_scraping

Section 2: Auotmate Web Browsing with Selenium

  1. Automating Web Browsing
  2. Basic Browser Interactions
  3. Handling Drag and Drop
  4. Selenum Implicit Wait Functions
  5. Selenum Explicit Wait Functions

web_scraping

web_scraping

Section 3: Automating with APIs

  1. Create API Requests
  2. Parsing through JSON
  3. Using API Keys
  4. Linking API Calls

web_scraping

web_scraping

Resources:

Selenium | Quotes to Scrape | Scraping Club | ChromeDriver | GeckoDriver | Rapid API

Note: There is a challenge exercise and solution included in this repository. We suggest beginners to play around with the tools and practice with some real-world examples. The best way to learn a tool is to use it!

Copyright © 2020 Norman Lo