ldig (Language Detection with Infinity Gram)

Updated for Python3

This is a prototype of language detection for short message service (twitter). with 99.1% accuracy for 17 languages

Usage

Install with setup.py install
import ldig from ldig
lmodel = ldig()
ldig.detect_text(some text)

Data format

As input data, Each tweet is one line in text file as the below format.

[label]\t[some metadata separated '\t']\t[text without '\t']

[label] is a language name alike en, de, fr and so on. It is also optional as metadata. (ldig doesn't use metadata and label for detection, of course :D)

The output data of lidg is as the below.

[correct label]\t[detected label]\t[original metadata and text]

Estimation Tool

ldig has a estimation tool.

./server.py -m [model directory]

Open http://localhost:48000 and input target text into textarea. Then ldig outputs language probabilities and feature parameters in the text.

Supported Languages

cs Czech
da Dannish
de German
en English
es Spanish
fi Finnish
fr French
id Indonesian
it Italian
nl Dutch
no Norwegian
pl Polish
pt Portuguese
ro Romanian
sv Swedish
tr Turkish
vi Vietnamese

Documents

Copyright & License

All codes and resources are available under the MIT License.