Mask on clips with varied length

Hi~
My VAE training result produces mean or static poses in most cases.
I found that VAE transformers take masks to process clips with varied length, but loss computation doesn't apply them.
Does it works normally or just heavily affects the results on my small dataset?

Thanks :)

Hi, both gt and predictions should be applied padding process. The loss part does not need to apply masks, because the net has already set the padding part as zeros.

motion-latent-diffusion/mld/models/architectures/mld_vae.py

Line 245 in c28a064

output[~mask.T] = 0

Your VAE training should not result in static poses, and the static results mean your training is bad. Please refer to #28

Please also check your training data, hyper-parameters, and mean/std for datasets (like below).

motion-latent-diffusion/README.md

Line 196 in c28a064

    
           If your demo results have a severe issue on foot sliding, please take a look to the below. It could happen when ``self.feats2joints`` (use mean and std for de-normalization) is broken.

If you use a new dataset, you should replace the mean/std files.

Thanks for the reply :)
I miss that line, sorry.

I'll check the function and mean/std.
Btw, I compute the feature & data as the same as HumanML3D.