/wikilanguages

Data set of popular wikipedia articles across 41 languages

Primary LanguagePython

Files

features.pkl.bz2: All of the training data needed to create language classifiers. This data is released under the Creative Commons Attribution-Share-Alike License 3.0 (CC-BY-SA). http://creativecommons.org/licenses/by-sa/3.0/

example.py: Example code for generating language classifiers.

lang_map.py: Language codes to language name mappings.

wiki_attribution.txt: Each line of this file contains the title of a page in the features.pkl.bz2 dataset and a link to that page.