codekansas/keras-language-modeling

Evaluation Result Correct?

wailoktam opened this issue · 2 comments

To save time, I set np_epoch to 2 and the program only displays 1 epoch. I choose that epoch and evaluate it against the test sets: The top 1 precision figures seem to 1/10 of what the paper claims? Or do I misunderstand something?

Epoch 1/1
14832/14832 [==============================] - 236s - loss: 0.0297 - val_loss: 0.0154
Best: Loss = 0.0154112447405, Epoch = 1
2016-06-14 08:22:54 :: ----- test1 -----
[====================]Top-1 Precision: 0.049444
MRR: 0.131885
2016-06-14 08:46:11 :: ----- test2 -----
[====================]Top-1 Precision: 0.040000
MRR: 0.124294
2016-06-14 09:09:09 :: ----- dev -----
[====================]Top-1 Precision: 0.053000
MRR: 0.128266

The reason is the code for i in range(1, nb_epoch). The i will reach nb_epoch - 1. Therefore, you can change the code to for i in range(1, nb_epoch+1)
Of course, one epoch is not enough.

Hi, thanks for your response. I have tried running 100 epochs and the result (top 1 precision) looks close to what is posted here. However, this is still quite different from the figure given in the papers (they report 60+ % top 1 precision for dense+cnn+1 max and attention-lstm). Can we reasonably doubt the result reported in these papers, provided that we have done nothing wrong in the implementation here?