snipsco/snips-nlu

Umlauts within phrase are causing odd intent matches

Corasonn opened this issue · 1 comments

Some of my entity values contain umlauts. When I want to recognize them with a specific intent, snips matches it so any other intent that also contains this entity. But the right intent would fit 100%. With any other value without an umlaut, snips will match the right intent with 1.0 score.

Expected:
Intents with entities with umlauts are matched correctly.

Environment:

  • OS: OSX 10.15.5
  • python version: 2.7
  • snips-nlu version: 0.20.1

I found the problem. When I have more than 10000 entity values, snips doesn't build some entity variations due to a better building performance.
PR was: #804

Unfortunately, it seems to break umlauts when the "case" variation is missing. I forked the project and changed it hardcoded (https://github.com/Corasonn/snips-nlu).
I'm not a python developer, so if someone knows how to set it via flag, it would be great!