Simple toy example in which the player interacts with a Markov Decision Process (MDP) and tries to guess from the returned states and rewards what the optimal policy would be.
The MDP scripted inside is the following:
The code is thoroughly commented and can easily be adapted to any MDP.