jadore801120/attention-is-all-you-need-pytorch

Surprising PPL on WMT 17

luffycodes opened this issue · 0 comments

Running the code with n_head set to 1 leads to PPL of 6.65 (other parameters are same as that in readme). The resulting log is attached below. I'm surprised by such low PPL because n_head set to default results in PPL of 11. Is this behaviour as expected?

"[ Epoch 356 ]

  • (Training) ppl: 11.29374, accuracy: 74.314 %, elapse: 0.540 min
  • (Validation) ppl: 6.65451, accuracy: 67.306 %, elapse: 0.006 min
    • [Info] The checkpoint file has been updated."