DeepRLExercises

Usage

For example, to train an agent for Pong with GPU 0,

$ cd src/
$ CUDA_VISIBLE_DEVICES=0 python main_dqn.py --env_name PongNoFrameskip-v4 --replay_memory_size 1000000 loss_name mse_loss --optim_name Adam --lr 1e-4 --batch_size 32

NOTE

src/ contains codes for Deep Q Network solutions to Breakout, Atari.

Results

Evaluations are done by greedy policies with learned agents.

Breakout

Deep Q Network
trained for 28,400 episodes (although should be reported in number of frames the agent learned).
main modifications to the setting reported in the paper 2015:
- Adam(lr=3e-5)

Pong

Deep Q Network same as the Breakout
Bellow are figures of
- left & middle: DQN agent-learned behaviors. Finally, the agent seemed to learn to exploit the oponent behavior and "crack" the Pong game (middle).
- right: learning curves (horizontal axis: episodes, vertical axis: total rewards in an episode)
  - green line: SmoothL1Loss, Adam(3e-5). totally sames as the Breakout agent.
  - gray line: MSELoss, RMSprop(lr=1e-4, momentum=0.)

Resources

https://github.com/berkeleydeeprlcourse

fujiki-1emon/DeepRLExercises

DeepRLExercises

Usage

NOTE

Results

Breakout

Pong

Resources