explosion/spaCy

oov word prob is zero

rajhans opened this issue · 9 comments

Hi,

I just installed 1.0.1 on MacOSx. I find that the model is assigning zero probability to oov words:
import spacy
nlp=spacy.load('en')
x=nlp(u'this is an oovword')
[(t, t.is_oov, t.prob) for t in x]

[(this, False, -5.36181640625), (is, False, -4.457748889923096), (an, False, -6.014852046966553), (oovword, True, 0.0)]

More context: I had the same experience as the issue referenced here #535, and so I did these sequence of steps:
uninstall 0.100.0, install 1.0.1, download data, uninstall 1.0.1, install 1.0.1, download data

Can you do

ls `python -c "import spacy; print(spacy.get_data_path())"`

And tell me what you see?

ls python -c "import spacy; print(spacy.get_data_path())"

Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'get_data_path'

Ah.

ls `python -c "import spacy; print(spacy.util.get_data_path())"`

Here it is:

cache cookies.txt en-1.1.0 en_glove_cc_300_1m_vectors-1.0.0

Also
ls -R python -c "import spacy; print(spacy.util.get_data_path())""

gives the following in en-1.1.0/vocab directory:

gazetteer.json lexemes.bin serializer.json tag_map.json
lemma_rules.json oov_prob strings.json vec.bin

Sorry I was in a hurry and didn't read your issue properly. This is obviously a bug — sec.

Published v1.0.3 on PyPi. Should be fixed. Thanks again!

Fantastic! Thanks Matthew.

lock commented

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.