Detectors

Tools used for this purpose:

*: Supports the Guarani language.

Installation

Pre-requisites:

Install requirements pip install -r requirements.txt

Download fastText lib.

~~Download the crubadan corpus.~~

# commented out due to low precision of textcat, use glcd3 instead.
"""
import nltk
nltk.download('crubadan')
nltk.download('punkt')
"""

Command Line Interface

All commands must be run from the src directory.

Detect language of tweets

python run.py [data_dir] [file_name_of_tweets] [language_lexicon] --detect_language --guarani

data_dir: path to data directory and must be relative to the src directory. Required.
file_name_of_tweets: Name of the file containing the tweets in CSV format. Required.
language_lexicon: Name of the file containing the language's (to-identify) words lexicon. Optional. In fact, language_lexicon can be any low-resource language.
guarani: The language (to-identify) is Guarani (or another low-resource language)? Optional. Needed for language_lexicon.

mmaguero/lang_detection

Detectors

Installation

Pre-requisites:

Command Line Interface

Detect language of tweets