muupan/async-rl

Some evaluation results are missing

muupan opened this issue · 2 comments

In scores.txt of the current uploaded trained model, evaluation results at 55000000 and 56000000 are missing.

54000000 41383.44816946983 448.7 408.0 133.6006071177157

I don't know why and whether it can affect performance. I need to check.

I found that missing evaluation is caused by processes stuck in evaluate_performance(). It is possible that some policies fail start to play Breakout, preventing episodes from being terminated. If so, it might be necessary to use epsilon-greedy-like action selection in addition to sampling from softmax policies in test runs.

It didn't occurred for Space Invaders. For Breakout we might need to force long episodes to finish.