/mppi_pendulum

The reimplementation of Model Predictive Path Integral (MPPI) from the paper "Information Theoretic MPC for Model-Based Reinforcement Learning" (Williams et al., 2017) for the pendulum OpenAI Gym environment

Primary LanguagePython

MPPI implementation with the OpenAI gym pendulum environment

This repository implements Model Predictive Path Integral (MPPI) as introduced by the paper Information Theoretic MPC for Model-Based Reinforcement Learning by (Williams et al., 2017) and takes as forward model the pendulum OpenAI Gym environment.

Requirements

  • OpenAI Gym
  • numpy

Gists of the paper

The paper derives an optimal control law as a (noise-) weighted average over sampled trajectories. In particular, the optimization problem is posed to compute the control input such that the controlled distribution Q is pushed as close as possible to the optimal distribution Q*. This corresponds to minimizing the KL divergence between Q and Q*.

The gists from the paper:

  • the noise assumption vt ̴ N(ut, ∑) stems from noise in low-level controllers

  • the noise term can be pulled out of the Monte-Carlo approximation (η) equation and neatly interpreted as a weight for the MC samples in the iterative update law

  • given the optimal control input distribution Q*, it is derived u*t = ∫q*(V)vtdV

  • computing the integral is not possible since q* is unknown, instead importance sampling is used to sample from the proposal distribution:

    where can be approximated by the Monte-Carlo estimate given in algorithm 2 as η, yielding:

    which resembles an iterative procedure to improve the MC estimate by using a more accurate importance sampler