nicklashansen/tdmpc

Typo in the arxived paper and some question on the notation.

mch5048 opened this issue · 1 comments

Hi, thanks for brining a new perspective to this field with this paper. I really enjoyed reading it.

While reading through the paper, I found a typo in the inline equation that describes the MPC.

In the Model predictive control subsection under Preliminaries, it is stated that the globally optimal policy
\Pi_{\theta} is proportional to the expectation of the negated Q-values.
I think the negation should be removed, intuitively.

Another question regarding the description of MPPI in Sec. 3 is about Eq.(4), that describes the CEM.

Here, the mean/var of the j-th policy is computed based on weighted/shifted \Gamma, where \Gamma is
denoted as sampled trajectory in the paragraph. I guess the authors meant the state-action sequence as \Gamma.
Thus, \Gamma^{\star}_i in Eq.(4) should be replaced with the action I guess.
Maybe the code snippet here
corresponds to this equation?

I wonder if I understood it correctly.
Thanks in advance!

Hi, glad to hear that you find our work interesting, and thank you for the feedback! You are right, I'll make sure to include these things in the next revision. Thanks!