google-deepmind/mujoco_mpc

Explicit mpc for trajectory guidence

MyelinsheathXD opened this issue · 4 comments

Hello! I am trying to optimize humanoid tracking task for real time usage. However ILQG is little hard for some complex tasks.
So far I have implamented offline trajectory MLP NN model with regregression for humanoid locomotion task, but traned not good enough to balance in all complex states.
However the MPL NN model is good enough to generelize offline tragectories with 80% accuracy.

My goal is now optize MujocoMPC’s ILQG for real time usage with initial trajectory guidance or warmstart for the action trajectories,
Somewhat Similar to this paper

https://arxiv.org/abs/1907.03613

What is the main things inside ILQG that can be aproximately optimized offline and combine traned models with ILQG to use the models in real time?

In the GUI application, see the bottom right live plot. This provides timing for various planner subroutines. You can use this information to inform which parts of iLQG to improve in order to achieve real-time performance for your application.

Components of iLQG that could be learned include: model derivatives (this is the most expensive subroutine for the humanoid walking example) or a value function to be used as a terminal cost term that would enable shorter planning horizons to be effective.

Screenshot 2024-05-27 121752

I have this data plot on my task.
Which shows Model deriv is the most time consuming computation about 40% time of the total itiration time.

Specifically these functions:

 "mjd_transitionFD()"  ----> "mjd_stepFD()"( getting Jacobian) ---->"mjd_stepFD()" function part  line e.g. "finite-difference velocities:  skip=mjSTAGE_POS 
if (DyDv || DsDv) " 

After profiling mjd_transisitonFD() function I have found this compute time for the main funtions

ModelDerivatives::Compute() with 16 thread           == 25msek
mjd_transitionFD( in single thread)                      ==5msek
mjd_stepFD()( in single thread)                          ==5msek

[function part "finite-difference velocities:  skip=mjSTAGE_POS 
if (DyDv || DsDv)  ]                                           ==1msek

every other   "finite-difference" funtion parts    ==1msek

After profiling , I am assuming the most time consuming part of model derivition funtion is just computing large data with simple finite-difference function " ds = (s2 - s1) / h"

In my view, Optimazation to the funtion mjd_stepFD() is only possible through utilizing hardware properly like SIMD or GPU parallel computation
For software size , I think finite-difference function " ds = (s2 - s1) / h" can not be optimized further like using offline Traning Models.

In order to optimize the task for real time usage, I think giving right action controll trajectories to the rollout tasks during initial rollout is most proper way to optimize. Since at the end of the day action controll trajectories are the data that will computed in real time.

For now I am sure the function "iLQG" can be optimized further using hardware parallel computation like cpu SIMD or general purpose GPU computation
On the other hand,
What do you think , can iLQG's Model derivative function be optimized through offline Model Training?
Which funtions of the iLQG can be optimized using pretrained models?

The MuJoCo model derivatives (Jacobians) are computed using mjd_transitionFD. This function returns 4 matrices: A, B, C, D. Each of these matrices is a function of the current state $s$ and action $a$. For example: $X = f(s, a)$. The function $f$ comprises multiple calls to MuJoCo's physics and finite-difference computation.

It should be possible to train a model for each matrix: $\hat{X} = g(s, a, \theta)$ with learnable parameters $\theta$, where $\hat{X} \approx X$. Hopefully, evaluation of $g$ is much faster compared to $f$.


iLQG optimizes fixed-length trajectories with horizon $T$. The complexity of the algorithm is $O(n_u^3 T)$ (i.e., linear in the trajectory horizon). At each iteration the objective comprising a summation of stage cost terms is minimized:
$$\text{minimize} \sum \limits_{t=0}^{T-1} c(s_t, a_t) + c_T(s_T)$$.
To reduce the algorithm's complexity, select $H < T$ and train a model $V$ with learnable parameters $\theta$ such that:
$$\sum \limits_{t=0}^{T-1} c(s_t, a_t) + c_T(s_T) \approx \sum \limits_{t=0}^{H-1} c(s_t, a_t) + V(s_H, \theta)$$

Thank you for deep explanation!