How to jointly optimize action detector and trajectory diffuser?

Question

How to jointly optimize action detector and trajectory diffuser?

ManUtdMoon opened this issue a year ago · 6 comments

Dear authors,

Thank you for your inspiring work!

I noticed that in the ChainedDiffuser paper your mentioned that "...train both the action detector and the trajectory diffuser jointly" and "we train the first 2 terms till convergence, and then add the 3rd term for joint optimization". However, I did not see that there are codes for joint optimization because the only model in main_trajectory.py is a DiffusionPlanner.

Would you please explain more about the actual joint training of Act3d and DiffusionPlanner?

Regards,
Dongjie

Answer 1 · 2023-12-26T20:45:55.000Z

Hi, in our latest exps we found that simply training the models separately would yield similar/even better performance. Joint training is not needed any more.

Answer 2 · 2023-12-27T00:46:26.000Z

Hi, in our latest exps we found that simply training the models separately would yield similar/even better performance. Joint training is not needed any more.

Hi Zhou, thank you for your quick reply!

I think seperate training needs relabeling of the keypose. Could you please show me how it is done in the code？ Thank you！

Answer 3 · 2023-12-27T01:13:00.000Z

What do you mean relabeling? We use the same strategy for both act3d and chained diffuser for extracting keyframes.

…

On Tue, Dec 26, 2023 at 4:46 PM Dongjie Yu ***@***.***> wrote: Hi, in our latest exps we found that simply training the models separately would yield similar/even better performance. Joint training is not needed any more. Hi Zhou, thank you for your quick reply! I think seperate training needs relabeling of the keypose. Could you please show me how it is done in the code？ Thank you！ — Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEV4V6OHGFX73HADGXKNZHTYLNVWZAVCNFSM6AAAAABBDMKP3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRZHA2DMNBYGQ> . You are receiving this because you commented.Message ID: ***@***.***>

Answer 4 · 2023-12-27T01:17:19.000Z

The paper mentioned that goal gripper pose is not ground truth but predicted by the action detector. Therefore, during training, I think the target keypose should be relabeled by act3d instead of taking from the ground truth. Is there something wrong with my understanding？

Answer 5 · 2023-12-27T01:22:15.000Z

You can simply train them separately and use gt pose during training. At inference time act3d can be used to key pose prediction.

…

On Tue, Dec 26, 2023 at 5:17 PM Dongjie Yu ***@***.***> wrote: The paper mentioned that goal gripper pose is not ground truth but predicted by the action detector. Therefore, during training, I think the target keypose should be relabeled by act3d instead of taking from the ground truth. Is there something wrong with my understanding？ — Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEV4V6OBMFLKE7FVFYSXMYDYLNZKTAVCNFSM6AAAAABBDMKP3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRZHA2TONRVGA> . You are receiving this because you commented.Message ID: ***@***.***>

Answer 6 · 2023-12-27T01:23:42.000Z

Thank you for your answers！ Have a nice day！