/wp2txt

WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.

Primary LanguageRubyMIT LicenseMIT

Stargazers

No one’s star this repository yet.