07 NLP: Bag of words. incorrect embedding size?
dmitriibeliakov opened this issue · 3 comments
dmitriibeliakov commented
embedding_size = len([x for x in vocab_processor.transform(texts)])
It only counts the number of sentences. I guess it must count the number of unique words.
If I'm right then the code must be like this:
transformed_texts = np.array([x for x in vocab_processor.transform(texts)])
embedding_size = len((np.unique(transformed_texts)))
Instead of 5574 I'm now getting 8206
xu94-nlp commented
I found the same problem, and thank you very much for the post. It helped me deal with the same issue!
nfmcclure commented
Hi @versusnja , Thanks for the report. And sorry about the late reply. I'm just now getting around to updating the code and triaging the issues.
I'm certain you are correct and when I get around to chapter 7 in the next few months, expect this change to be incorporated.
Thanks again!
nfmcclure commented
This should be updated and fixed now. Thanks!