About the video model

Question

About the video model

yingchengyang opened this issue a year ago · 1 comments

Thanks for such a wonderful work! I'm curious about the video model. What are the datasets used for the training of the video model? In my opinion, in the dmc setting, the video model will use expert trajectories of all dmc tasks, like walker-walk, and cheetah-run. Is it right? If so, how can the model generate videos of different tasks with the same embodiment (like walker-stand and walker-walk)?

Thanks again!

Answer 1 · 2024-07-24T10:16:16.000Z

The video model samples consecutive frames so sampled trajectories for the same embodiment will represent different tasks. One can condition on a task ID or text to sample a video of a particular task.