/deeprl-ppo

An implementation of proximal policy optimization (PPO) for Atari in TensorFlow.

Primary LanguagePython

This is an implementation of proximal policy optimization I made as part of my work getting up to speed in Deep RL under a grant from the Machine Intelligence Research Institute. I left many of my debug notes in intentionally, in hopes they might be helpful to anyone else attempting a similar project. Also see my postmortem for a breakdown of (known) errors made during the implementation.

If all prerequisites are installed, type "python ppo.py" to run in the default configuration on Pong.

Here's a graph of performance on Pong with the default configuration:

PPO performance graph for Pong