wiki-to-plaintext

There are 1 repositories under wiki-to-plaintext topic.

  • david-smejkal/wiki2txt

    A tool to extract plain (unformatted) multilingual text, redirects, links and categories from wikipedia backups (dumps). Designed to prepare clean training data for AI training / Machine Learning software.

    Language:Python5211