getLanguages() and getShortTextLanguages() need documentation
Closed this issue · 2 comments
Hi!
This file contains two methods getLanguages()
and getShortTextLanguages()
:
https://github.com/optimaize/language-detector/blob/master/src/main/java/com/optimaize/langdetect/profiles/BuiltInLanguages.java
What's the difference between a language
and a short text language
?
Updated Javadocs would be nice, plus an answer right here of course :).
Regards /Johan
The "short" part refers to the length of the text being analyzed -- @shuyo generated those profiles using Twitter text as training data, where tweets are limited to 140 characters. So it takes into account the style of text used in Twitter, with a lot of abbreviations and a minimal writing style. The regular profiles, on the other hand, were generated using text from Wikipedia abstracts.
Added Javadoc:
/** * Returns the languages for which the library provides full profiles. * Full provides are generated from regular text, usually Wikipedia abstracts. * @return immutable */ public static List<LdLocale> getLanguages() { return languages; } /** * Returns the languages for which the library provides profiles created from short text. * Twitter was used as source by @shuyo. * Much less languages have short text profiles as of now. * @return immutable */ public static List<String> getShortTextLanguages() { return shortTextLanguages; }