getLanguages() and getShortTextLanguages() need documentation

Question

getLanguages() and getShortTextLanguages() need documentation

Closed this issue 8 years ago · 2 comments

Hi!

This file contains two methods getLanguages() and getShortTextLanguages():
https://github.com/optimaize/language-detector/blob/master/src/main/java/com/optimaize/langdetect/profiles/BuiltInLanguages.java

What's the difference between a language and a short text language?

Updated Javadocs would be nice, plus an answer right here of course :).

Regards /Johan

Answer 1 · 2016-04-21T14:03:43.000Z

The "short" part refers to the length of the text being analyzed -- @shuyo generated those profiles using Twitter text as training data, where tweets are limited to 140 characters. So it takes into account the style of text used in Twitter, with a lot of abbreviations and a minimal writing style. The regular profiles, on the other hand, were generated using text from Wikipedia abstracts.

Answer 2 · 2016-10-07T13:06:41.000Z

Added Javadoc:

/**
 * Returns the languages for which the library provides full profiles.
 * Full provides are generated from regular text, usually Wikipedia abstracts.
 * @return immutable
 */
public static List<LdLocale> getLanguages() {
    return languages;
}

/**
 * Returns the languages for which the library provides profiles created from short text.
 * Twitter was used as source by @shuyo.
 * Much less languages have short text profiles as of now.
 * @return immutable
 */
public static List<String> getShortTextLanguages() {
    return shortTextLanguages;
}