Reinforcement learning experiments

In order of completing these notebooks:

Experiments on using XGBoost as Q-approximating function

Optimisng a simple heuristic which eventually solved the environment

Further experiments on Q Learning and using a heuristic loss

Also there's another one where I used a heuristic instead of random exploration during training, but without any considerable success. Next thing I'd do is increasing batch sizes and look for optimal learning rate then.

asjir/reinforcement_learning_experiments

Reinforcement learning experiments