/rl_bipedal

Solving OpenAI gym's Bipedal environment using Proximal Policy Optimization (PPO)

Primary LanguageJupyter NotebookMIT LicenseMIT

About

Solving OpenAI's Bipedal Walker environment using Proximal Policy Optimization (PPO) algorithm. For comparison an implementation with DDPG algorithm is also provided.

Results

Trained Agent

Project Structure

2 different solutions have been implemented in this repo

  • ppo: directory for ppo agent
    • ppo_continuous.py: code for running the ppo agent for the bipedal environment
    • ppo_bipedal.ipynb: jupyter notebook for agent training and visualisation
    • utils.py: utility functions
    • deep_rl: directory with modular functions for the PPO agent
  • ddpg: directory for ddpg agent
    • DDPG.ipynb: jupyter notebook for training the agent
    • ddpg_agent.py: code for the agent model, experience replay and OU noise
    • model.py: actor and critic networks

Instructions

Open ppo/ppo_bipedal.ipynb to see an implementation of PPO with OpenAI Gym's BipedalWalker environment.