vocab.txt size doesn't match CNN dataset vocabulary size.
webeng opened this issue · 1 comments
webeng commented
Firstly, thank you for open sourcing this project.
How did you get the vocab.txt file? There are 29,406 words in the file.
However, I counted all the unique words in the CNN dataset and there are 119,567 unique words.
webeng commented
According to the original authors the original size should be ~120K.