/simple_rl

A simple framework for experimenting with Reinforcement Learning in Python 2.7.

Primary LanguagePythonApache License 2.0Apache-2.0

simple_rl

A simple framework for experimenting with Reinforcement Learning in Python 2.7.

There are loads of other great libraries out there for RL. The aim of this one is twofold:

  1. Simplicity.
  2. Reproducibility of results.

A brief tutorial for a slightly earlier version is available here.

Just requires numpy and matplotlib. Some MDPs have visuals, too, which requires pygame.

Also includes support for hooking into any of the Open AI Gym environments.

Installation

The easiest way to install is with pip. Just run:

pip install simple_rl

Alternatively, you can download simple_rl here.

Example

Some examples showcasing basic functionality are included in the examples directory.

To run a simple experiment, import the run_agents_on_mdp(agent_list, mdp) method from simple_rl.run_experiments and call it with some agents for a given MDP. For example:

# Imports
from simple_rl.run_experiments import run_agents_on_mdp
from simple_rl.tasks import GridWorldMDP
from simple_rl.agents import QLearnerAgent

# Run Experiment
mdp = GridWorldMDP()
agent = QLearnerAgent(mdp.get_actions())
run_agents_on_mdp([agent], mdp)

Overview

  • (agents): Code for some basic agents (a random actor, Q-learner, [R-Max], Q-learner with a Linear Approximator, etc.).

  • (experiments): Code for an Experiment class to reproduce results.

  • (mdp): Code for a basic MDP and MDPState class. Also contains OO-MDP implementation [Diuk et al. 2008].

  • (planning): Implementations for planning algorithms, includes ValueIteration and MCTS [Couloum 2006].

  • (tasks): Implementations for a few standard MDPs (grid world, n-chain, Taxi [Dietterich 2000], etc.). Recently added support for the OpenAI Gym.

  • (utils): Code for charting utilities.

Making a New MDP

Make an MDP subclass, which needs:

  • A static variable, ACTIONS, which is a list of strings denoting each action.

  • Implement a reward and transition function and pass them to MDP constructor (along with ACTIONS).

  • I also suggest overwriting the "__str__" method of the class, and adding a "__init__.py" file to the directory.

  • Create a State subclass for your MDP (if necessary). I suggest overwriting the "__hash__", "__eq__", and "__str__" for the class to play along well with the agents.

Making a New Agent

Make an Agent subclass, which requires:

  • A method, act(self, state, reward), that returns an action.

  • A method, reset(), that puts the agent back to its tabula rasa state.

Let me know if you have any questions or suggestions.

Cheers,

-Dave