dborrelli/chat-intents

Label extraction only for english.

schmarion opened this issue · 1 comments

Hi,
I am using chat-intents and the clustering works very well.
However, I am working with french data and the label extraction gives poor results. I assume it's because this method necessarily uses a specialized spacy model for English.
I was wondering if the name of the loaded spacy model or at least the language could be passed as a parameter of apply_and_summarize_labels for example ?
This way, the performance could be much better for all languages other than English.

Hi,
Interesting suggestion! Adding the language model as a parameter should be straightforward, but I'd want to ensure the approach still works well with other languages and I'm a bit bandwidth-limited at the moment. Happy to consider a PR though if you have something working.