test_name , en , fr , de , nl , ro
alchemy_api-twitter_tests , 100.0, 95.0 , 100.0, 85.0 , 85.0
google_api-twitter_tests , 100.0, 100.0, 100.0, 100.0, 95.0
language_detector-twitter_tests , 65.0 , 80.0 , 100.0, 95.0 , 55.0
language_detector_tc-twitter_tests, 85.0 , 100.0, 85.0 , 95.0 , 65.0
whatlanguage-twitter_tests , 95.0 , 90.0 , 90.0 , 90.0 , 0.0
- Content/day:
TODO
TODO
- Available languages: dutch, english, farsi, french, german, italian, pinyin, portuguese, russian, spanish, swedish
A more comprehensive list on ealdent fork.
-
Technical reference: Evaluation of language identification methods
-
Detection method: ngrams
-
Training corpus: from Wikipedia
-
built from scratch texts included with gem
-
Available languages: arabic, bulgarian, czech, danish, german, greek, english, estonian, spanish, farsi, finnish, french, irish, hebrew, hindi, croatian, italian, japanese, korean, hungarian, turkish, dutch, norwegian, polish, portuguese, romanian, russian, slovenian, swedish, thai, ukraninan, vietnamese, chinese
-
textcat ngram database (26 languages based on European corpus)
-
Available languages:
-
build for tweets detection
-
Technical reference: http://blog.echen.me/2011/05/01/unsupervised-language-detection-algorithms/
-
Available languages: du, en, sp
- Price: $20 per 1M chars
- Available languages: tons
-
Demo: web interface
-
Price: not displayed
-
Available languages: 95+ european, asian
-
Demo: api call
-
Price: 5k requests, 1MB/day is free, 100k requests, 20MB/day is $5/month
-
Available languages: 96 languages