openai/coinrun

High variance in mean_score during test

KaiyangZhou opened this issue · 1 comments

Thanks for this code.

I found the variance in mean_score is quite high when I run the test code for multiple times (using the same trained model) with the same set of parameters (num_eval=20 and rep=5), e.g. I got mean_score=3.8, 5.2 & 4.6 for three runs, and sometimes mean_score>6.0 for the same model. Is this normal?

In addition, what values for num_eval and rep would you suggest in order to obtain a fair result for comparison between methods?

Using those values, you're only evaluating on 5*20=100 levels, so it's not very surprising that there'd be high variance. If you evaluate on 1k levels the variance will be substantially lower. 5k levels should give very low variance.