This is a PyTorch implementation of DQN. Here is the similar implementation in SONY NNabla.
- gym
pip install gym
pip install gym[atari]
- TensorboardX
pip install tensorboardX
Run (see options by -h
):
python train_atari.py
This will output the following files in a log folder (.tmp.monitor
by default):
- Mean episodic rewards in each epoch over time.
- Learned models with
nnp
format.
Run (see options by -h
):
python play_atari.py
This implementation uses OpenAI baseline's parameter by default.
Ours | Baseline (defaults.py) | |
---|---|---|
Learning Rate | 1e-4 | 1e-4 |
Start Epsilon | 1.0 | 1.0 |
Final Epsilon | 0.01 | 0.01 |
Epsilon Decay Steps | 1e6 | 1e6 (when max_step:1e8) |
Dicount Factor (gamma) | 0.99 | 0.99 |
Batch Size | 32 | 32 |
Replay Buffer Size | 10000 | 10000 |
Target Network Update Freq. | 1000 | 1000 |
Learning Start Step | 10000 | 10000 |
There are 2 steps when evaluating Atari games.
-
Intermediate evaluation during training
- In every 1M frames (250K steps), the mean reward is evaluated using the Q-Network parameter at that timestep.
- The evaluation step lasts for 500K frames (125K steps) but the last episode that exceeeds 125K timesteps is not used for evaluation.
- epsilon is set to 0.05 (not greedy).
-
Final evaluation for reporting
-
The Q-Network parameter (.nnp file) with the best mean reward is used. You can just look through
Eval-Score.series.txt
and find the maximum scored steps. Note thatqnet_XXX.nnp
'sXXX
represents steps and not frames, whereas the score output inEval-Score.series.txt
are frames. -
Using the best .nnp file, by running as shown below, the specified game is played for 30 episodes with a termination threshold of 4500 steps.
Run (see options by
-h
):python eval_atari.py
-