Reinforcement learning experiments
In order of completing these notebooks:
Experiments on using XGBoost as Q-approximating function
Optimisng a simple heuristic which eventually solved the environment
Further experiments on Q Learning and using a heuristic loss
Also there's another one where I used a heuristic instead of random exploration during training, but without any considerable success. Next thing I'd do is increasing batch sizes and look for optimal learning rate then.