Tutorial with basic Q-learning and policy gradient
- Python 3.6 (other python3 will work almost certainly, python2 - perhaps after some adjustments)
- numpy
- gym (basic version)
- pytorch 0.4.0
These can be conveniently installed with conda in a conda environment, the basic version of gym is pip-installable.
This code is partially based on the tutorial of Arthur Juliani.