/Proximal-Policy-Optimization

Proximal Policy Optimization, or PPO, is a policy gradient method for reinforcement learning. The motivation was to have an algorithm with the data efficiency and reliable performance of TRPO

Primary LanguageJupyter Notebook

Stargazers

No one’s star this repository yet.