Reinforce

Reinforce.jl is an interface for Reinforcement Learning. It is intended to connect modular environments, policies, and solvers with a simple interface.

Packages which build on Reinforce:

AtariAlgos: Environment which wraps Atari games using ArcadeLearningEnvironment
OpenAIGym: Wrapper for OpenAI's python package: gym

New environments are created by subtyping AbstractEnvironment and implementing a few methods:

reset!(env)
actions(env, s) --> A
step!(env, s, a) --> r, s′
finished(env, s′)

and optional overrides:

state(env) --> s
reward(env) --> r

which map to env.state and env.reward respectively when unset.

ismdp(env) --> bool

An environment may be fully observable (MDP) or partially observable (POMDP). In the case of a partially observable environment, the state s is really an observation o. To maintain consistency, we call everything a state, and assume that an environment is free to maintain additional (unobserved) internal state. The ismdp query returns true when the environment is MDP, and false otherwise.

TODO: more details and examples

Agents/policies are created by subtyping AbstractPolicy and implementing action. The built-in random policy is a short example:

type RandomPolicy <: AbstractPolicy end
action(policy::RandomPolicy, r, s′, A′) = rand(A′)

The action method maps the last reward and current state to the next chosen action: (r, s′) --> a′.

Iterate through episodes using the Episode iterator. A 4-tuple (s,a,r,s′) is returned from each step of the episode:

ep = Episode(env, policy)
for (s, a, r, s′) in ep
    # do some custom processing of the sars-tuple
end
R = ep.total_reward
T = ep.niter

There is also a convenience method run_episode. The following is an equivalent method to the last example:

R = run_episode(env, policy) do
    # anything you want... this section is called after each step
end

JobJob/Reinforce.jl

Reinforce

Author: Tom Breloff