david-smejkal/wiki2txt
A tool to extract plain (unformatted) multilingual text, redirects, links and categories from wikipedia backups (dumps). Designed to prepare clean training data for AI training / Machine Learning software.
PythonGPL-2.0
Issues
- 1
Stuck at 95.45%
#1 opened by TaciteOFF