/language-detection

language deteciton

Primary LanguagePython

language-detection

language deteciton program written in python

classifier

naive bayes classifier.

features

features are bigrams in the text.

model

pretrained model using bag of n-grams

usage

python src/language_detector.py test/test.txt

Language Support

70 Built-in Language Profiles

  1. af Afrikaans
  2. an Aragonese
  3. ar Arabic
  4. ast Asturian
  5. be Belarusian
  6. br Breton
  7. ca Catalan
  8. bg Bulgarian
  9. bn Bengali
  10. cs Czech
  11. cy Welsh
  12. da Danish
  13. de German
  14. el Greek
  15. en English
  16. es Spanish
  17. et Estonian
  18. eu Basque
  19. fa Persian
  20. fi Finnish
  21. fr French
  22. ga Irish
  23. gl Galician
  24. gu Gujarati
  25. he Hebrew
  26. hi Hindi
  27. hr Croatian
  28. ht Haitian
  29. hu Hungarian
  30. id Indonesian
  31. is Icelandic
  32. it Italian
  33. ja Japanese
  34. km Khmer
  35. kn Kannada
  36. ko Korean
  37. lt Lithuanian
  38. lv Latvian
  39. mk Macedonian
  40. ml Malayalam
  41. mr Marathi
  42. ms Malay
  43. mt Maltese
  44. ne Nepali
  45. nl Dutch
  46. no Norwegian
  47. oc Occitan
  48. pa Punjabi
  49. pl Polish
  50. pt Portuguese
  51. ro Romanian
  52. ru Russian
  53. sk Slovak
  54. sl Slovene
  55. so Somali
  56. sq Albanian
  57. sr Serbian
  58. sv Swedish
  59. sw Swahili
  60. ta Tamil
  61. te Telugu
  62. th Thai
  63. tl Tagalog
  64. tr Turkish
  65. uk Ukrainian
  66. ur Urdu
  67. vi Vietnamese
  68. yi Yiddish
  69. zh-cn Simplified Chinese
  70. zh-tw Traditional Chinese

reference

https://github.com/optimaize/language-detector