Examples of finite-horizon optimal control problems implemented as environments for reinforcement learning algorithms
PythonApache-2.0
Finite-Horizon Control Gym
The repository contains examples of finite-horizon optimal control problems implemented as environments (MDPs) for reinforcement learning algorithms. Since the problems are initially described by differential equations, in order to formalize them as MDPs, a uniform time-discretization with the diameter dt is used. In addition, it is important to emphasize that, in the problems with a finite horizon, optimal policies depend not only on the phase vector $x$, but also on time $t$. Thus, we obtain MDPs, depending on dt, with continuous state space $S$ containing states $s=(t,x)$ and continuous action space $A$.
Interface
The finite-horizon optimal control problems are implemented as environments with an interface close to OpenAI Gym with the following attributes:
state_dim - the state space dimension;
action_dim - the action space dimension;
terminal_time - the action space dimension;
dt - the time-discretization diameter;
reset() - to get an initial state (deterministic);
step(action) - to get next_state, current reward, done (True if t > terminal_time, otherwise False), info;
virtual_step(state, action) - to get the same as from step(action), but but the current state is also set
Environments
The following examples of finite-horizon optimal control problems are implemented:
SimpleControlProblem (2,1)
The dynamical system is described by the simple motion:
TargetProblem is an optimal control problem presented in Munos (2006). The dynamical system
describes a hand holding a spring to which is attached a mass:
where $G = 6.67 \times 10^{-20}$ is the gravitational constant; $M = 5.97 \times 10^{24}$ is the mass of the Earth; $m=50$ is the satellite mass. The aim of the control is to transfer the satellite into a new orbit $7100$ and provide the speed for stable retention in this orbit: