vocab.txt size doesn't match CNN dataset vocabulary size.

Question

webeng opened this issue 9 years ago · 1 comments

Firstly, thank you for open sourcing this project.

How did you get the vocab.txt file? There are 29,406 words in the file.

However, I counted all the unique words in the CNN dataset and there are 119,567 unique words.

Answer 1 · 2016-05-25T08:39:47.000Z

According to the original authors the original size should be ~120K.