In PPO.ipynb, the position of action loss epoch and value loss epoch need to be swapped.
wadx2019 opened this issue · 0 comments
wadx2019 commented
In PPO.ipynb, the position of action loss epoch and value loss epoch need to be swapped and I suggest that you'd better use RMSprop as the optimizer and reduce the learning rate to make these RL model easier to converge.