/web-cleaner

Select linguistically relevant parts of HTML pages and convert them into plain text

Primary LanguagePerl

Web cleaner

Select linguistically relevant parts of HTML pages and convert them into plain text

Author

Gwénolé Lecorvé, IRISA

Usage

perl html2txt.pl <html_file>