zero-to-mastery/breads-server

Update article scraper

Closed this issue · 4 comments

bloomberg still registers as robot
other websites don't register an article title

Dragnet looks promising
https://github.com/dragnet-org/dragnet

consider layering scrapers, as they might have different strengths, and then falling back to an API
https://mercury.postlight.com/web-parser/

Using cached data for now. Will continue to look at other options