A one dimensional corridor environment for Reinforcement Learning. The goal is to get to the end (i.e. move right [length] number of times).
Inspired by Ray's RLlib example environment and Sutton and Barto's examples (e.g. example 13.1 on page 323).
Compatible with AgentOS.