codebasics/nlp-tutorials

'Word2VecKeyedVectors' object has no attribute 'get_mean_vector'

shiv425 opened this issue · 4 comments

while converting tokens to vector for complete sentence in preprocess_and_vectorize method ,got error "'Word2VecKeyedVectors' object has no attribute 'get_mean_vector'"

i tried to convert each token in vector and then to take mean using np.mean..but while converting df['Text'] to vector form getting errors like "Key 'u.s.-based' not present","Key ' ' not present","Key '2018' not present" etc..please help.

I think he used old version of gensim library, from 3.8 to 4.0 a lot of attributes changed. I also facing same issues tried couple of thing but it didnt help at all. Poorly documentated library to be honest im seaching hours and couldnt find anything useful.

`def preprocess_and_vectorize(text):
# remove stop words and lemmatize the text
doc = nlp(text)
filtered_tokens = []
arr = []
for token in doc:
if token.is_stop or token.is_punct:
continue
filtered_tokens.append(token.lemma_)
for token in filtered_tokens:
try:
arr.append(wv[token])
except:
continue

return np.mean(arr,axis=0)`

used this code.used try catch because many words have no vector in WV.

Solution to the problem

This is the alternative I have found for this problem and it's working

import spacy
import numpy as np
nlp=spacy.load("en_core_web_lg")
def preprocess_and_vectorize(text):
doc = nlp(text)
filtered_tokens = []
for token in doc:
if token.is_punct or token.is_stop:
continue
filtered_tokens.append(token.lemma_)
return np.mean(wv[filtered_tokens])