/sa_wiki_text

Dump text from sanskrit wikipedia

Primary LanguageJupyter NotebookMIT LicenseMIT

Sanskrit Wikipedia Text Dump

This repo uses WikiExtractor for dumping the text from Sanskrit Wikipedia. The zipped dumped text in XML format is available here. The XML document format is specified here.

Please see this notebook for an example of how to use the data.

As a hack, this repo uses Travis-CI to run the dump script periodically and upate the dump.