optimaize/language-detector

Every Time It Returns only absent()

Opened this issue · 2 comments

Every Time It Returns only absent()

You might want to give some more info 😃

@hattewarsm are you referring to something like:

        List<LanguageProfile> languageProfiles = new LanguageProfileReader().readAllBuiltIn();
        LanguageDetector detector = LanguageDetectorBuilder.create(NgramExtractors.standard())
                .withProfiles(languageProfiles)
                .build();

        Optional<LdLocale> detected = detector.detect("コンコルド001試作機は1969年3月2日にトゥールーズで初飛行した");

and detected has value Optional.absent()?

I tested a few more examples:

  • hello -> absent
  • hello world, how are you doing? -> absent
  • hello world, how are you doing? This string is obviously English! -> Optional.of(en)

This detector requires the most confident language detected to have >= 0.9999 confidence. This does seem rather high. Confidence below this returns Optional.absent().

You may be better off using detector.getProbabilities and taking the most confident language (.get(0) - they're sorted).

If this isn't the case, I think you'd have to give more information for the ticket not to be rejected.