NaNs output from the policy network

Question

NaNs output from the policy network

stevenbinhu21 opened this issue 7 years ago · 2 comments

Hi, I was training the model for one of the minigames. And after about 60000 episodes, the policy for the actions probabilities output all NaNs, I haven't been able to track down the problem yet, as it takes long to get to that point for debugging. Just wonder if you have encountered such problem before?

Answer 1 · 2018-05-10T09:58:26.000Z

Which of the games are you training? Yes, we also encountered it occasionally in DefeatRoaches and DefeatZerglingsAndBanelings (if I remember correctly), however it only was for some runs and I also wasn't able to identify the source of the problem.

Answer 2 · 2018-06-21T14:08:57.000Z

I am also encountering NaN while running DefeatZerglingsAndBanelings and CollectMineralShards as well. Is there any updates on this issue?