Reproduce for Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update (NeurIPS 2019) with Tensorflow
- Tensorflow-gpu > 1.13 with eager execution, or tensorflow 2.x
- Tensorflow-probability 0.6.0
- OpenAI baselines
- OpenAI Gym
The following command should train an agent on "Breakout" for 20M frames.
python run_atari.py --env BreakoutNoFrameskip-v4
deepq.py
contains stepping the environment, storing experience and saving models.deepq_learner.py
contains the action selection and EBU training.replay_buffer.py
contains replay buffer for EBU.models.py
contains Q-network.run_atari.py
contains hyper-parameters setting. Run this file will start training.
The data for separate runs is stored on disk under the result
directory with filename
<env-id>-<algorithm>-<date>-<time>.
Each run directory contains
log.txt
Record the episode, exploration rate, episodic mean rewards in training (after normalization and used for training), episodic mean scores (raw score), current timesteps, percentage completed.monitor.csv
Env monitor file by usinglogger
fromOpenai Baselines
.parameters.txt
All hyper-parameters used in training.progress.csv
Same data aslog.txt
but withcsv
format.evaluate scores.txt
Evaluation of policy for 108000 frames every 1e5 training steps with 30 no-op evaluation.model_10M.h5
,model_20M.h5
,model_best_10M.h5
,model_best_20M.h5
are the policy files saved.