A bug of vocabulary size

Question

A bug of vocabulary size

Opened this issue 7 years ago · 0 comments

In Question-Answer task, using the provided data. The vocabulary size of the provided pretrained QACNN model is 3231, but the vocabulary size calculated by "build_vocab" function is 3449.
This does not cause any error in tensorflow 0.12.0 of gpu version, however, it will report "index out of range" error when using a tensorflow 0.12.0 of cpu version.
According to tensorflow/tensorflow#5847. This is because when using tensorflow of cpu version, the "embedding_lookup" function will report error if any index is out of range in "embedding_lookup"function, but it will return zero vector for words whose index is out of range using tensorflow of gpu version. This is a small bug, since we usually use gpu to train deep model, and the words whose index out of range may be infrequent words, therefore, it does not have a great impact on model performace. It will be wonderful if it can be fixed.
Could you please provide a new pretrained QACNN model with correct vocabulary size? Or I can create a pull request to fix the bug in "build_vocab"?