/lidruby

Language Identification with Ruby: probabilistic language identification with ruby1.9

Primary LanguageRuby

NB: Requires ruby1.9.

A module to identify which of any one of a number of human languages a given text is in.

We use a simple similarity measure between frequency counts of bigrams to compare an unknown text to a set of models of known languages.  

The language models are built with samples from: 

  http://www.unicode.org/udhr/downloads.html

which is copied to models/ .