This is a playground for RL code and textbook.
Code is for building block implementation rather than modules.
Implementation includes: Epsilon Greedy Agent with test for different epsilon(exploration-exploitation trade-off), different step size, and non-stationary environment (i.e. reward distribution changes suddenly).