RasaHQ/rasa-nlu-examples

Gensim Featurizer - Object has no attribute vocab

RWolfing opened this issue · 4 comments

I am following this guide to add the GensimFeaturizer to my chatbot. Unfortunately, I am now running into an error when running "rasa train"

  File "...\lib\site-packages\rasa_nlu_examples\featurizers\dense\gensim_featurizer.py", line 68,
     in train self.set_gensim_features(example, attribute)
  File "...\lib\site-packages\rasa_nlu_examples\featurizers\dense\gensim_featurizer.py", line 80, 
     in set_gensim_features for t in tokens
  File "...\lib\site-packages\rasa_nlu_examples\featurizers\dense\gensim_featurizer.py", line 80, 
     in <listcomp>  for t in tokens
  File "...\lib\site-packages\gensim\models\keyedvectors.py", line 418, in __contains__ return word 
     in self.vocab AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vocab'

This is the code snippet used to create the word vectors:

from gensim.models import Word2Vec
from gensim.test.utils import common_texts
# Tried with gensim 4 and gensim 3.8
model = Word2Vec(sentences=common_texts, vector_size=10, window=3,
                 min_count=1, workers=2)
model.wv.save(r'../01-dataset/wordvectors.kv')

Running this snippet creates one file wordvectors.kv.

What makes me scratch my head:
I thought this snippet should also create 3 additional npy-files wordvectors.kv.vectors.npy, wordvectors.kv.vectors_ngrams.npy, wordvectors.kv.vectors_vocab.npy. I downgraded the gensim version to 3.8, but this does not change the result/error.

Edit: The .npy files are only generated if the vector-arrays get too large. This then leads again to the question of why it is not working. 🤔

What packages/versions I am using

rasa: 2.8.2
rasa-nlu-examples: 0.2.5
gensim: 4.0.1

Mhm... this is indeed strange and deserves exploring further. I won't be able to have a look in the short term unfortunately though since I'm preoccupied with the transition to Rasa 3.0.

Is there a particular reason why you're interested in using Gensim here? Is there a reason why BytePair or FastText may not suffice?

No not really, only a lack of knowledge of how to create the word vectors with the other frameworks 😅. Alright, thanks for the fast answer 👍. I will have a look and try to migrate to a different solution.

I fear the Gensim issue may be even more complex than I thought before. It seems the BytePair embeddings still use Gensim v3, which might prevent an upgrade to v4.

Am closing this issue in favor of #110.