kwonoj/cld3-asm

is there any option to remove detectable language?

Closed this issue · 1 comments

is there any option to remove detectable language?
simply i tested 'test' but it gives like below.
{language: "de", probability: 0.6367550492286682, is_reliable: false, proportion: 1}
if i narrow detectable language, it might give better result.

No, there isn't. cld3-asm provides 1:1 corresponding interface to cld3's language identifier (https://github.com/google/cld3/blob/fa5974a4d3b5e7934fcb166ff26ed6bfce68b18a/src/nnet_language_identifier.h#L67-L79) and doesn't intend to provide additional interfaces.

Cld3 itself recommends to supply sufficient length of text buffer to get accurate detection, at least more than 140char. Providing short word like test won't able to provide anywhere close to actual language detection since there can be too many false positives.