/bootstrap_dqn

Implementation of Bootstrap DQN and Randomized Prior Functions on ALE

Primary LanguagePython

Bootstrap DQN

This repo contains our implementation of a Bootstrapped DQN with options to add a Randomized Prior, Dueling, and Double DQN in ALE games.

Deep Exploration via Bootstrapped DQN

Randomized Prior Functions for Deep Reinforcement Learning

Some results on Breakout

alt text

This gif depicts the orange agent from below winning the first game of Breakout and eventually winning a second game. The agent reaches a high score of 830 in this evaluation. There are several gaps in playback due to file size. We show agent steps [1000-1500], [2400-2600], [3000-4500], and [16000-16300].

Comparison:

  • (blue) DQN with epsilon greed annealed between 1 and 0.01
  • (orange) Bootstrap with epsilon greedy annealed between 1 and 0.01
  • (green) Bootstrap without epsilon greedy exploration
  • (red) Bootstrap with randomized prior

All agents were implemented as Dueling, Double DQNs. The xlabel in these plots, "steps", refers to the number of states the agent observed thus far in training. Multiply by 4 to account for a frame-skip of 4 to describe the total number of frames the emulator has progressed.

Our agents are sent a terminal signal at the end of life. They face a deterministic state progression after a random number<30 of no-op steps at the beginning of each episode.

alt text

Some results on Pong

Here are some results on Pong with Boostrap DQN w/ a Randomized Prior. A optimal strategy is learned within 2.5m steps.

alt text

Pong agent score in evaluation - reward vs steps alt text

Some results on Freeway

Here are some results on Freeway with Boostrap DQN w/ a Randomized Prior. The random prior allowed us to solve this "hard exploration" problem within 4 millions steps.

alt text

Freeway agent score in evaluation - reward vs steps

alt text

Dependencies

atari-py installed from https://github.com/kastnerkyle/atari-py
torch='1.0.1.post2'
cv2='4.0.0'

References

We referenced several execellent examples/blogposts to build this codebase:

Discussion and debugging w/ Kyle Kaster

Fabio M. Graetz's DQN

hengyuan-hu's Rainbow

Dopamine's baseline