GuyTevet/MotionCLIP

results on HumanML3D dataset

zshyang opened this issue · 1 comments

Hi Guy Tevet,

I trust this message finds you well. I sincerely hope recent events, including any conflicts, haven't hindered your research efforts. Please accept my best wishes for you and everyone in your vicinity.

Upon checking the example text file you provided in this repo, I observed that the dataset's captions seem relatively simpler than those of HumanML3D, which has been used in your other studies.

With that in mind, I would be grateful if you could answer a couple of inquiries:

  1. Have you experimented with MotionCLIP on the HumanML3D dataset? If so, could you kindly share your experiences? Please understand that even brief insights would be greatly appreciated. It is totally fine if it is not a very accurate description.
  2. In your MotionCLIP-used dataset, did you attempt to train a standalone decoder utilizing CLIP features as input?

I thank you in advance for any insights you might offer.
Best regards.

(1) Indeed MotionCLIP was trained using BABEL with simpler textual labels. We didn't train with HumanML and I expect it will yield better results.
(2) No.

Hope it helps:)