bigscience-workshop/multilingual-modeling

Inconsistent Evaluation Results

yongzx opened this issue · 0 comments

I am getting different results by running training/eval together and separately.
Rerunning evaluation after training (by removing --do_train) gives me a better result than running training+eval together.