nfmcclure/tensorflow_cookbook

07 NLP: Bag of words. incorrect embedding size?

dmitriibeliakov opened this issue · 3 comments

embedding_size = len([x for x in vocab_processor.transform(texts)])

It only counts the number of sentences. I guess it must count the number of unique words.
If I'm right then the code must be like this:

transformed_texts = np.array([x for x in vocab_processor.transform(texts)])
embedding_size = len((np.unique(transformed_texts)))

Instead of 5574 I'm now getting 8206

I found the same problem, and thank you very much for the post. It helped me deal with the same issue!

Hi @versusnja , Thanks for the report. And sorry about the late reply. I'm just now getting around to updating the code and triaging the issues.

I'm certain you are correct and when I get around to chapter 7 in the next few months, expect this change to be incorporated.

Thanks again!

This should be updated and fixed now. Thanks!