castorini/honk

evaluation on wrong model in train()

Closed this issue · 1 comments

I think that there is an error in train(). Evaluate() at the end of the training uses the last trained model, not the best one according to the dev set. The results are probably very similar though.

The best model is actually saved but one has to run the code with --type eval --input_file "best_model" to get the actual accuracy on the eval set.

You're right about the uncanny logic, but it's more of a potential gotcha instead of a mistake. I would support changing it to be more standard.