/trafilatura

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Watchers