This is an attempt to implement the bandits algorithm in paper Learning to reinforcement learn.
The algorithm should be mostly correct. And this work is based on the repo:https://github.com/awjuliani/Meta-RL (TensorFlow)
The Labryinth experiments will be pushed soon.
run python a3c-bandit.py --num-threads=32 --episode-len=100