/ddpg-bipedal

Primary LanguageJupyter Notebook

Notes for tensorflow version

This is still in progress... It could be run but cannot solve the problem. If you have all required packages installed (such as gym, box2d, etc.), you can run simply by bash train.sh.

Notes for Pytorch version

The agent finally achieve mean rewards more than 30 over 100 episodes at last, just the time I was about the to give up ;)

The sad part is it suddenly behaved badly at the end of the last 100 episodes, otherwise, it could achieve more than 60 :(

I tried to improve it further by adding batch normalization, using priority queue instead of deque, adding noise decay. Nothing seems promising.

I don't know where the error is: I save the model and reload it, then the agent seems start all over again. Maybe it has something to do with replay buffer being cleared after restart training, maybe.

Actor-Critic Methods

Instructions

Open DDPG.ipynb to see an implementation of DDPG with OpenAI Gym's BipedalWalker environment.

Results

Trained Agent