About embedding_weights
chunjoe opened this issue · 8 comments
First, thank you for code sharing.
In w2v.py, I saw your code as follows:
embedding_weights = [np.array([embedding_model[w] if w in embedding_model
else np.random.uniform(-0.25, 0.25, embedding_model.vector_size)
for w in vocabulary_inv])]
For obtaining weights from embedding_model, parameter w must be a word, e.g. "happy".
But, in w2v.py, "for w in embedding_model ", w is an index of word
Is that a mistake here?
The code "else np.random.uniform(-0.25, 0.25, embedding_model.vector_size)" seems been executed in every loop.
Hi,
for w in vocabulary_inv
is list of words, not indexes.
Hi,
I appreciate for your instant reply.
In here, you mentioned that it is dict {int:str}.
In for w in vocabulary_inv , is w a list of words?
Sorry, vocabulary_inv is list of strings, not dict. And w is string (i.e. word)
Sorry to disturb you again. I still feel it is strange...
In sentiment_cnn.py, vocabulary_inv is a dictionary object {int:str}. The vocabulary_inv is inputted to train_word2vec as a part of parameters then.
vocabulary = imdb.get_word_index()
vocabulary_inv = dict((v, k) for k, v in vocabulary.items())
vocabulary_inv[0] = "<PAD/>"
In w2v.py, I don't see where vocabulary_inv is converted to a list type object.
And I added print(type(vocabulary_inv )) in w2v.py. The program printed <class 'dict'> out.
This discrepancy arose after I switched to new [keras] data source. In previous major version data source was data_helpers.load_data() and it returns vocabulary_inv as list. I will fix it when I have more time. Should be dict everywhere
Thank you very much!!!
I wrote the following code. I know that is a little waste of memory...
For the purpose of solving problem , is the code right?
vocabulary_inv_list = [vocabulary_inv[i] for i in range(0, len(vocabulary_inv))]
embedding_weights = [np.array([embedding_model[w] if w in embedding_model
else np.random.uniform(-0.25, 0.25, embedding_model.vector_size)
for w in vocabulary_inv_list])]
Looks okay. embedding_weights must be a list of len=1 of ndarray with shape=(len(vocabulary_inv), num_features). It was made a list for compatibility with keras layer.set_weights()
Please see updated version