Proximal Policy Optimisation (PPO) PyTorch implementation for the inverted double pendulum problem
Primary LanguageJupyter Notebook