practical-nlp/practical-nlp-code

Issue with code sample in book from chapter 3 "PRE-TRAINED WORD EMBEDDINGS"

david-sitsky opened this issue · 0 comments

Hi - apologies if this is the wrong place to report this, but I have been reading the online version of this book, and when I try to run the following code sample from chapter 3 with the path to the model updated:

from gensim.models import Word2Vec, KeyedVectors
pretrainedpath = "NLPBookTut/GoogleNews-vectors-negative300.bin"
w2v_model = KeyedVectors.load_word2vec_format(pretrainedpath, binary=True)
print('done loading Word2Vec')
print(len(w2v_model.vocab)) #Number of words in the vocabulary.
print(w2v_model.most_similar['beautiful'])
W2v_model['beautiful']

It fails with the following:

$ python3 word2vec.py                                                                                                                                                        
done loading Word2Vec
Traceback (most recent call last):
  File "word2vec.py", line 5, in <module>
    print(len(w2v_model.vocab)) #Number of words in the vocabulary.
  File "/home/sits/.local/lib/python3.8/site-packages/gensim/models/keyedvectors.py", line 645, in vocab
    raise AttributeError(
AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

I can see the code for Ch3 has been changed to take this into account, eg, removing the len() call and using code like:

print(w2v_model.most_similar('beautiful'))

Can the online book be updated with the correct code?