Gensim Featurizer - Object has no attribute vocab
RWolfing opened this issue · 4 comments
I am following this guide to add the GensimFeaturizer to my chatbot. Unfortunately, I am now running into an error when running "rasa train"
File "...\lib\site-packages\rasa_nlu_examples\featurizers\dense\gensim_featurizer.py", line 68,
in train self.set_gensim_features(example, attribute)
File "...\lib\site-packages\rasa_nlu_examples\featurizers\dense\gensim_featurizer.py", line 80,
in set_gensim_features for t in tokens
File "...\lib\site-packages\rasa_nlu_examples\featurizers\dense\gensim_featurizer.py", line 80,
in <listcomp> for t in tokens
File "...\lib\site-packages\gensim\models\keyedvectors.py", line 418, in __contains__ return word
in self.vocab AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vocab'
This is the code snippet used to create the word vectors:
from gensim.models import Word2Vec
from gensim.test.utils import common_texts
# Tried with gensim 4 and gensim 3.8
model = Word2Vec(sentences=common_texts, vector_size=10, window=3,
min_count=1, workers=2)
model.wv.save(r'../01-dataset/wordvectors.kv')
Running this snippet creates one file wordvectors.kv.
What makes me scratch my head:
I thought this snippet should also create 3 additional npy-files wordvectors.kv.vectors.npy, wordvectors.kv.vectors_ngrams.npy, wordvectors.kv.vectors_vocab.npy. I downgraded the gensim version to 3.8, but this does not change the result/error.
Edit: The .npy files are only generated if the vector-arrays get too large. This then leads again to the question of why it is not working. 🤔
What packages/versions I am using
rasa: 2.8.2
rasa-nlu-examples: 0.2.5
gensim: 4.0.1
Mhm... this is indeed strange and deserves exploring further. I won't be able to have a look in the short term unfortunately though since I'm preoccupied with the transition to Rasa 3.0.
Is there a particular reason why you're interested in using Gensim here? Is there a reason why BytePair or FastText may not suffice?
No not really, only a lack of knowledge of how to create the word vectors with the other frameworks 😅. Alright, thanks for the fast answer 👍. I will have a look and try to migrate to a different solution.
I fear the Gensim issue may be even more complex than I thought before. It seems the BytePair embeddings still use Gensim v3, which might prevent an upgrade to v4.