A mxnet implementation of meta-RL

This is an attempt to implement the bandits algorithm in paper Learning to reinforcement learn.

The algorithm should be mostly correct. And this work is based on the repo:https://github.com/awjuliani/Meta-RL (TensorFlow)

The Labryinth experiments will be pushed soon.

Usage

run python a3c-bandit.py --num-threads=32 --episode-len=100