ShortText algorithm sometimes yields zero probabilities for all languages
Closed this issue · 1 comments
AlbertWeichselbraun commented
detectBlockShortText
does not break, once CONV_THRESHOLD
has been reached. Depending on the text size this leads to zero probabilities for all languages.
Example:
The bulgarian sentence
Европа не трябва да стартира нов конкурентен маратон и изход с приватизация
yields a zero probability for all languages and, therefore, no result.
How to reproduce:
add the following line to runTests
in the DataLanguageDetectorImplTest
unittest:
assertEquals(detector.getProbabilities(text("Европа не трябва да стартира нов конкурентен маратон и изход с приватизация")).get(0).getLocale().getLanguage(), "bg");
danielnaber commented
It seems to me the break
introduces a bug, please see #91.