SINr-Embeddings/sinr

Missing words in similarity evaluation regarding minfreq

Closed this issue · 0 comments

Describe the bug
The number of missing words is increasing when min_freq is increased. Meanwhile, it should not increase thanks to the exception list.

To Reproduce

  1. Evaluate similarity with a corpus generated with a given minfreq
  2. Increase minfreq

Expected behavior
The number of missing word should not increase. This number is supposed to be the same for a given corpus.