Human-level-control-through-deep-reinforcement-learning

A jax/stax implementation of the Nature paper: Human-level control through deep reinforcement learning [1]

The agent at qdn.agent.py implements the bsuite.baseline.base.Agent interface. The dqn//train.py interfaces with a dm_env.Environment. We wrap the gym-atari suite using the bsuite.utils.gym_wrapper.DMEnvFromGym adapter into a dqn.AtariEnv to implement historical observations and actions repeat.

Implementation status of some of the techniques used in the paper:

Installation

To run the algorithm on a GPU, I suggest to install the gpu version of jax [4]. You can then install this repo using Anaconda python and pip.

conda env create -n dqn
conda activate dqn
pip install git+https://github.com/epignatelli/human-level-control-through-deep-reinforcement-learning

References

[1] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G. and Petersen, S., 2015. Human-level control through deep reinforcement learning. nature, 518(7540), pp.529-533.

[2] Lin, L.-J. Reinforcement learning for robots using neural networks. Technical Report, DTIC Document (1993)

[3] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M., 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

[4] Bradbury, J., Frostig, R., Hawkins, P., Johnson, M.J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., VanderPlas, J., Wanderman-Milne, S., Zhang, Q. JAX: composable transformations of Python+NumPy programs. 2018

[5] Bellemare, M. G., Veness, J. & Bowling, M. Investigating contingency awareness using Atari 2600 games. Proc. Conf. AAAI. Artif. Intell. 864–871 (2012)

[6] Sutton, R.S. and Barto, A.G., 1998. Introduction to reinforcement learning (Vol. 135). Cambridge: MIT press.