the quick convergence proof for the CartPole-v0

Question

the quick convergence proof for the CartPole-v0

zhouwenchi opened this issue 6 months ago · 0 comments

Hello, thank you for sharing. Your work has been very helpful to me!
I encountered some issues while training in the CartPolo environment. Although the training time has accelerated, the reward continues to decrease in the later stages of training, as shown in the figure. My hyperparameters are the same as your example.
Can you tell me where the quick convergence proof is in the code? Thank you!