dongzhuoyao/motionfm

jittery results

Opened this issue · 2 comments

I plugged in a new dataset, after 1800 epochs, I see that semantically, the generation appears to follow the text conditioning, but the poses are too jittery (please see attachment). Could you maybe point out what's wrong?

text: straightening up
https://github.com/dongzhuoyao/motionfm/assets/169649811/b4fa0242-97d5-4080-827f-3578c3cd0d84

thanks!

Hi, thanks for your interest to our work, could you elaborate how large your dataset is, waht your text encoder is, and how large the network is? what's your sampler and sampling steps?

Thank you for your response. I have a dataset that contains 600 sequences with a total of 34000 frames. My step size is 1 and if the sequence length is larger than the maximum number of frames, I randomly select a start index. I am using the default text encoder in the framework, i.e. CLIP. Do you think the have too little data? I don't observe this problem when I train using the diffusion model.