/simple-A2C-PPO

Actor-critic trained w PPO on OpenAI's Procgen Benchmark (PyTorch). Built from scratch.

Primary LanguageJupyter Notebook

Actor Critic with PPO

For intuitive guide to the mechanics of actor-critic methods check out accompanying comic.

Notebook designed for readability and exploration rather than production. Uses a single GPU. For an industrial-strength PPO in PyTorch check out ikostrikov's. For the 'definitive' implementation of PPO, check out OpenAI baselines (tensorflow). For outstanding resources on RL check out OpenAI's Spinning Up

The notebook reproduces results from OpenAI's procedually-generated environments and corresponding paper (Cobbe 2019). All hyperparameters taken directly from paper. Built from scratch unless otherwise noted to gain intuition.