Results

test_name                         , en   , fr   , de   , nl   , ro
alchemy_api-twitter_tests         , 100.0, 95.0 , 100.0, 85.0 , 85.0
google_api-twitter_tests          , 100.0, 100.0, 100.0, 100.0, 95.0
language_detector-twitter_tests   , 65.0 , 80.0 , 100.0, 95.0 , 55.0
language_detector_tc-twitter_tests, 85.0 , 100.0, 85.0 , 95.0 , 65.0
whatlanguage-twitter_tests        , 95.0 , 90.0 , 90.0 , 90.0 , 0.0

Needs estimation

  • Content/day:

TODO

Setup

TODO

Language detector libraries

  • Available languages: dutch, english, farsi, french, german, italian, pinyin, portuguese, russian, spanish, swedish

A more comprehensive list on ealdent fork.

FM model

  • built from scratch texts included with gem

  • Available languages: arabic, bulgarian, czech, danish, german, greek, english, estonian, spanish, farsi, finnish, french, irish, hebrew, hindi, croatian, italian, japanese, korean, hungarian, turkish, dutch, norwegian, polish, portuguese, romanian, russian, slovenian, swedish, thai, ukraninan, vietnamese, chinese

TC model

  • textcat ngram database (26 languages based on European corpus)

  • Available languages:

Language detector web services

  • Price: $20 per 1M chars
  • Available languages: tons

Library

  • Demo: api call

  • Price: 5k requests, 1MB/day is free, 100k requests, 20MB/day is $5/month

  • Available languages: 96 languages

Library

wtf_language

Theoretical references