georgegebbett/recipe-buddy

Scraper error reporting

georgegebbett opened this issue · 0 comments

As I have discovered over the course of this project, the Recipe metadata is not embedded in pages in any kind of uniform way. So far this has been rectified through users reporting on GitHub/Reddit that a page scrape has failed, and the scraper logic being updated.

Ideally I would like unscrapable pages to be reported back automatically, so that trends can be identified and the scraper updated.

The way I see this working is with the user opting in, which will then be stored on their user object in Mongo. When they then encounter a scraping failure, the backend makes a request to some kind of central API containing the URL of the failed page. These URLs will then be stored somewhere for review, and the scraper can be updated to match this.