Alescontrela/viper_rl

About the video model

yingchengyang opened this issue · 1 comments

Thanks for such a wonderful work! I'm curious about the video model. What are the datasets used for the training of the video model? In my opinion, in the dmc setting, the video model will use expert trajectories of all dmc tasks, like walker-walk, and cheetah-run. Is it right? If so, how can the model generate videos of different tasks with the same embodiment (like walker-stand and walker-walk)?

Thanks again!

The video model samples consecutive frames so sampled trajectories for the same embodiment will represent different tasks. One can condition on a task ID or text to sample a video of a particular task.