optimaize/language-detector

how to retrieve a particular language from a set of possible languages?

Closed this issue · 1 comments

[DetectedLanguage[ar:0.8556187887595297], DetectedLanguage[ur:0.12626434999662134]]

languageDetector.detect(textObject) prints the above line as the output but the function returns "optional.absent()". So can anyone tell me about how to pick the most probable language(here: arabic with 85.5%)?

They are 2 different methods. The important text is: Returns the best detected language if the algorithm is very confident.

Here's the whole interface for reference:

/**
 * Guesses the language of an input string or text.
 *
 * <p>See website for details.</p>
 *
 * <p>This detector cannot handle well:
 * Short input text, can work or give wrong results.
 * Text written in multiple languages. It likely returns the language for the most prominent text. It's not made for that.
 * Text written in languages for which the detector has no profile loaded. It may just return other similar languages.
 * </p>
 *
 * @author Fabian Kessler
 */
public interface LanguageDetector {

    /**
     * Returns the best detected language if the algorithm is very confident.
     *
     * <p>Note: you may want to use getProbabilities() instead. This here is very strict, and sometimes returns
     * absent even though the first choice in getProbabilities() is correct.</p>
     *
     * @param text You probably want a {@link com.optimaize.langdetect.text.TextObject}.
     * @return The language if confident, absent if unknown or not confident enough.
     */
    Optional<LdLocale> detect(CharSequence text);

    /**
     * Returns all languages with at least some likeliness.
     *
     * <p>There is a configurable cutoff applied for languages with very low probability.</p>
     *
     * <p>The way the algorithm currently works, it can be that, for example, this method returns a 0.99 for
     * Danish and less than 0.01 for Norwegian, and still they have almost the same chance. It would be nice if
     * this could be improved in future versions.</p>
     *
     * @param text You probably want a {@link com.optimaize.langdetect.text.TextObject}.
     * @return Sorted from better to worse. May be empty.
     *         It's empty if the program failed to detect any language, or if the input text did not
     *         contain any usable text (just noise).
     */
    List<DetectedLanguage> getProbabilities(CharSequence text);

}