vrasneur/pyfasttext

How to count out of vocabulary words

Opened this issue · 1 comments

Hello,
I wonder if there is some way to count OOVs in my data. I want to evaluate coverage of my data by the fasttext model. And how can I get the words which actually exist in the model? Can I ignore OOVs somehow while working with the model?
Thank you

Hello @svetlana21,

You have the model.words attribute that contains a list of all the words that exist in the model.

Fasttext uses "subwords" to handle OOV words. You can use model.get_substrings(word) to get the subwords.