georgievnikolay/Web-Scraper

Python

Web Scraper for WordPress-based Blogs

By Julia, Kiril, Martina, and Nikolay

Usage:

Scraper and Formatter:

Run main.py with the name of a supported website.
To scrape the website to a json, run with -s/--scrape.
To format scraped data to a json, run with -f/--format.
Run without -s and -f to scrape and save only formatted data.
Specify the number of articles to scrape with -n NUM.

Web App:

web_instance.py starts a debug server with all previously scraped data.
run.sh:
- scrapes our primary supported blog (travelsmart),
- starts a server and opens it in the default browser,
- proceeds to scrape all supported blogs.
Newly scraped data is automatically loaded in.
An argument may be passed to specify the number of posts to scrape from each blog.

Supported blogs:

Task

Web scraper - automatically gather info from selected websites (blogs):

Develop a scraper using a Test Driven Development process.
Process the data for subsequent usage (storage/access/search).
Present the data through a simple frontend.