/rl_pong

Train a RL agent to play Pong using Proximal Policy Optimization (PPO)

Primary LanguageJupyter NotebookMIT LicenseMIT

About

Train a RL agent to play Pong using Proximal Policy Optimization (PPO)

Output demo

The player on the left is normal computer player while the one on the right is the implemented RL agent.

Using REINFORCE

output_demo_reinforce

Using PPO

output_demo_ppo