Mask on clips with varied length
csvt32745 opened this issue · 2 comments
Hi~
My VAE training result produces mean or static poses in most cases.
I found that VAE transformers take masks to process clips with varied length, but loss computation doesn't apply them.
Does it works normally or just heavily affects the results on my small dataset?
Thanks :)
Hi, both gt and predictions should be applied padding process. The loss part does not need to apply masks, because the net has already set the padding part as zeros.
Your VAE training should not result in static poses, and the static results mean your training is bad. Please refer to #28
Please also check your training data, hyper-parameters, and mean/std for datasets (like below).
motion-latent-diffusion/README.md
Line 196 in c28a064
If you use a new dataset, you should replace the mean/std files.
Thanks for the reply :)
I miss that line, sorry.
I'll check the function and mean/std.
Btw, I compute the feature & data as the same as HumanML3D.