This repository contains baseline implementations of several methods to better understand amd compare with a proposed method for imitation learning via diffusion
Kobilarov M. Cross-entropy motion planning. The International Journal of Robotics Research. 2012 Jun;31(7):855-71.
Botev ZI, Kroese DP, Rubinstein RY, L’Ecuyer P. The cross-entropy method for optimization. InHandbook of statistics 2013 Jan 1 (Vol. 31, pp. 35-59). Elsevier.
-
Assuming that actions are conditioned on the current state and are normally distributed, choose initial parameters
$\mu^{(0)}$ and$\sigma^{(0)};$ set$t$ = 1 -
Sample
N
actions$X_1, X_2, ..., X_n$ from Gaussian distribution with mean and variance$\mu^{(t)}, \sigma^{(t)}$ -
Select the best
Ne
samples to update$\mu^{(t)}, \sigma^{(t)}$ (this can also be done recursively) -
Stop if convergence criteria are satisfied; otherwise, increase
$t$ by 1 and repeat from step 2.
To run the isaacgym version of cem for cartpole execute python cem_cartpole.py
within the rlgpu conda env.