Mihailorama/ScrapperWeb

Python

WebScraper

The service parses the pages of the provided resource.

/scrape - full resource parsing
/update - scrape only those pages that are not listed in the filtered links file
/fail_urls - get a list of failed links
/scrape_one_page - parse only one specific page of the resource
/add_custom_page - add page data to the parsed data file
/load_data - get the contents of the parsed data file

check_scraper.py - the script checks the difference between the list of URLs of the last parsing and the current sitemap of the resource