SMooDi: Stylized Motion Diffusion Model
Lei Zhong, Yiming Xie, Varun Jampani, Deqing Sun, Huaizu Jiang
If you find our code or paper helpful, please consider starring our repository and citing:
@article{zhong2024smoodi,
title={SMooDi: Stylized Motion Diffusion Model},
author={Zhong, Lei and Xie, Yiming and Jampani, Varun and Sun, Deqing and Jiang, Huaizu},
journal={arXiv preprint arXiv:2407.12783},
year={2024}
}
- Release retargeted 100STYLE dataset.
- Code for Inference and Pretrained model.
- Evaluation code and metrics.
- Code for training.
We have released the retargeted 100STYLE dataset, mapped to the SMPL skeleton, available on Google Drive.
-
Retargeting with Rokoko: We used Rokoko to retarget 100STYLE motions to the SMPL skeleton template in BVH format. You can refer to this Video Tutorial for a detailed guide on using Rokoko.
-
Extracting 3D Joint Positions: After obtaining the retargeted 100STYLE dataset in BVH format, we utilized CharacterAnimationTools to extract 3D joint positions.
-
Deriving HumanML3D Features: Following the extraction, we used the instructions in the
motion_representation.ipynb
notebook available in HumanML3D to derive the HumanML3D features.
Available on Google Drive.
This code requires:
- Python 3.9
- conda3 or miniconda3
- CUDA capable GPU (one is enough)
Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpeg
For windows use this instead.
Setup conda env:
conda env create -f environment.yml
conda activate omnicontrol
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
Download dependencies:
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh
HumanML3D - Follow the instructions in HumanML3D, then copy the result dataset to our repository:
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
100STYLE - Download the dataset from Google Drive, then copy the files in texts, new_joints, and new_joint_vecs into their corresponding directories within ./dataset/HumanML3D. We use indices larger than 030000 to represent data from the 100STYLE dataset.
- Download the model(s) you wish to use, then unzip and place them in
./save/
. - Download the pretrained model from MLD and then copy it to
./save/
.
Please add the content text to ./demo/test.txt and the style motion to ./test_motion, then run:
bash demo.sh
Tips:
For some motion styles, the default parameter settings may not achieve the desired results. You can modify the guidance_scale_style
in config_cmld_humanml3d.yaml
to achieve a better balance between content preservation and style reflection.
You can train your own model via
bash train.sh
Tips:
- In
config_cmld_humanml3d.yaml
, settingis_recon: True
means that cycle loss will not be used during training. - In fact, the improvement in performance from cycle loss is quite limited. If you want to quickly train a model, you can set
is_recon: True
. With this setting, it will take nearly 50 minutes to train 50 epochs on an A5000 GPU and achieve performance nearly equivalent to the second row in Table 3 of our paper.
You can evaluate model via
bash test.sh
Tips:
- Make sure to set
is_test: True
during evaluation. - In
config_cmld_humanml3d.yaml
, settingis_guidance: True
means that classifier-based style guidance will be used during evaluation. Ifis_guidance: False
, evaluation will take nearly 50 minutes, whereas it will take 4 hours ifis_guidance: True
on an A5000 GPU.
Our code is heavily based on MLD.
The motion visualization is based on MLD and TMOS.
We also thank the following works:
guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl, MoDi, HumanML3D, OmniControl.
This code is distributed under an MIT LICENSE.
Note that our code depends on several other libraries, including SMPL, SMPL-X, and PyTorch3D, and utilizes the HumanML3D and 100STYLE datasets. Each of these has its own respective license that must also be adhered to.