NaNs output from the policy network
stevenbinhu21 opened this issue · 2 comments
stevenbinhu21 commented
Hi, I was training the model for one of the minigames. And after about 60000 episodes, the policy for the actions probabilities output all NaNs, I haven't been able to track down the problem yet, as it takes long to get to that point for debugging. Just wonder if you have encountered such problem before?
simonmeister commented
Which of the games are you training? Yes, we also encountered it occasionally in DefeatRoaches and DefeatZerglingsAndBanelings (if I remember correctly), however it only was for some runs and I also wasn't able to identify the source of the problem.
jaejaywoo commented
I am also encountering NaN while running DefeatZerglingsAndBanelings and CollectMineralShards as well. Is there any updates on this issue?