/trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Primary LanguagePythonApache License 2.0Apache-2.0

Watchers