How to count out of vocabulary words
Opened this issue · 1 comments
svetlana21 commented
Hello,
I wonder if there is some way to count OOVs in my data. I want to evaluate coverage of my data by the fasttext model. And how can I get the words which actually exist in the model? Can I ignore OOVs somehow while working with the model?
Thank you
vrasneur commented
Hello @svetlana21,
You have the model.words
attribute that contains a list of all the words that exist in the model.
Fasttext uses "subwords" to handle OOV words. You can use model.get_substrings(word)
to get the subwords.