RL-Bandit Implementation of the Upper Confidence Bound Section from "Introduction to RL" by Sutton & Barto. Just playing around and trying different approaches.