07 NLP: Bag of words. incorrect embedding size?

Question

dmitriibeliakov opened this issue 7 years ago · 3 comments

embedding_size = len([x for x in vocab_processor.transform(texts)])

It only counts the number of sentences. I guess it must count the number of unique words.
If I'm right then the code must be like this:

transformed_texts = np.array([x for x in vocab_processor.transform(texts)])
embedding_size = len((np.unique(transformed_texts)))

Instead of 5574 I'm now getting 8206

Answer 1 · 2017-11-21T14:06:19.000Z

I found the same problem, and thank you very much for the post. It helped me deal with the same issue!

Answer 2 · 2018-03-21T20:40:32.000Z

Hi @versusnja , Thanks for the report. And sorry about the late reply. I'm just now getting around to updating the code and triaging the issues.

I'm certain you are correct and when I get around to chapter 7 in the next few months, expect this change to be incorporated.

Thanks again!

Answer 3 · 2018-05-20T23:31:21.000Z

This should be updated and fixed now. Thanks!