- Run
main.py
with the name of a supportedwebsite
. - To scrape the website to a json, run with
-s
/--scrape
. - To format scraped data to a json, run with
-f
/--format
. - Run without
-s
and-f
to scrape and save only formatted data. - Specify the number of articles to scrape with
-n NUM
.
-
web_instance.py
starts a debug server with all previously scraped data. -
run.sh
:- scrapes our primary supported blog (travelsmart),
- starts a server and opens it in the default browser,
- proceeds to scrape all supported blogs.
Newly scraped data is automatically loaded in.
An argument may be passed to specify the number of posts to scrape from each blog.
Web scraper - automatically gather info from selected websites (blogs):
- Develop a scraper using a Test Driven Development process.
- Process the data for subsequent usage (storage/access/search).
- Present the data through a simple frontend.