language deteciton program written in python
naive bayes classifier.
features are bigrams in the text.
pretrained model using bag of n-grams
python src/language_detector.py test/test.txt
- af Afrikaans
- an Aragonese
- ar Arabic
- ast Asturian
- be Belarusian
- br Breton
- ca Catalan
- bg Bulgarian
- bn Bengali
- cs Czech
- cy Welsh
- da Danish
- de German
- el Greek
- en English
- es Spanish
- et Estonian
- eu Basque
- fa Persian
- fi Finnish
- fr French
- ga Irish
- gl Galician
- gu Gujarati
- he Hebrew
- hi Hindi
- hr Croatian
- ht Haitian
- hu Hungarian
- id Indonesian
- is Icelandic
- it Italian
- ja Japanese
- km Khmer
- kn Kannada
- ko Korean
- lt Lithuanian
- lv Latvian
- mk Macedonian
- ml Malayalam
- mr Marathi
- ms Malay
- mt Maltese
- ne Nepali
- nl Dutch
- no Norwegian
- oc Occitan
- pa Punjabi
- pl Polish
- pt Portuguese
- ro Romanian
- ru Russian
- sk Slovak
- sl Slovene
- so Somali
- sq Albanian
- sr Serbian
- sv Swedish
- sw Swahili
- ta Tamil
- te Telugu
- th Thai
- tl Tagalog
- tr Turkish
- uk Ukrainian
- ur Urdu
- vi Vietnamese
- yi Yiddish
- zh-cn Simplified Chinese
- zh-tw Traditional Chinese