curiosity-ai/catalyst

LanguageDetector.FromStoreAsync(): Why can't we pass an Array of Language?

Opened this issue · 0 comments

Good evening
Thank you very much for sharing your great work!

When testing LanguageDetector, it often happens that Catalyst recognizes languages of which it is already clear in advance that these languages are not even an option.

Therefore, it would be very useful if we could provide LanguageDetector with List<Language> to tell which languages are possible at all.

Is your feature request related to a problem? Please describe.
For example, if one only uses English, German, and French texts, LanguageDetector often detects Norwegian.

Describe the solution you'd like
I am pretty sure that if we can help LanguageDetector and say that the text can only be in one of three languages, it will then hit the right language much better 😃.

Describe alternatives you've considered
I tried downloading just the NuGet language models for English, German and French, but LanguageDetector nevertheless detected Norwegian. Very strange. it looks like LanguageDetector is automatically downloading Language Models (great!), but there is no word about this feature in the code comment 😢

Thanks a lot, kind regards,
Thomas