thomasmesnard/DeepMind-Teaching-Machines-to-Read-and-Comprehend

vocab.txt size doesn't match CNN dataset vocabulary size.

webeng opened this issue · 1 comments

Firstly, thank you for open sourcing this project.

How did you get the vocab.txt file? There are 29,406 words in the file.

However, I counted all the unique words in the CNN dataset and there are 119,567 unique words.

According to the original authors the original size should be ~120K.