dosovits/RL_tutorial

Tutorial with basic Q-learning and policy gradient

Python

RL tutorial

Tutorial with basic Q-learning and policy gradient

Dependencies

Python 3.6 (other python3 will work almost certainly, python2 - perhaps after some adjustments)
numpy
gym (basic version)
pytorch 0.4.0

These can be conveniently installed with conda in a conda environment, the basic version of gym is pip-installable.

Acknowledgements

This code is partially based on the tutorial of Arthur Juliani.