For technical details about language detection, I recommend this paper:
require 'language_detector' # using generated model (built from scratch) d = LanguageDetector.new p d.detect('this text is in English') # using textcat n-gram model d = LanguageDetector.new('tc') p d.detect('this text is in English')
-
Kevin Burton (training data): feedblog.org/2005/08/19/ngram-language-categorization-source/
-
Feedbackmine: twitter.com/feedbackmine