The official PyTorch implementation of the paper "Human Motion Diffusion as a Generative Prior"(ArXiv).
Please visit our webpage for more details.
Training | Generation | Evaluation | |
---|---|---|---|
DoubleTake (long motion) | ✅ | ✅ | ETA May 23 |
ComMDM (two-person) | ✅ | ✅ | ✅ |
Fine-tuned motion control | ETA May 23 | ETA May 23 | ETA May 23 |
📢 14/Apr/2023 - First release - DoubleTake/ComMDM - Training and generation with pre-trained models is available.
This code was tested on Ubuntu 18.04.5 LTS
and requires:
- Python 3.8
- conda3 or miniconda3
- CUDA capable GPU (one is enough)
Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpeg
For windows use this instead.
Setup conda env:
conda env create -f environment.yml
conda activate PriorMDM
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/GuyTevet/smplx.git
PriorMDM share most of its dependencies with the original MDM. If you already have an installed MDM from the official repo, you can save time and link the dependencies instead of getting them from scratch.
If you already have an installed MDM
Link from installed MDM
Before running the following bash script, first change the path to the full path to your installed MDM
bash prepare/link_mdm.sh
First time user
Download dependencies:
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh
Get HumanML3D dataset (For all applications):
Follow the instructions in HumanML3D, then copy the result dataset to our repository:
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
DoubleTake (Long sequences)
BABEL dataset
Download the processed version here, and place it at ./dataset/babel
SMPLH dependencies
Download here, and place it at ./bodymodels
ComMDM (two-person)
3DPW dataset
For ComMDM, we cleaned 3DPW and converted it to HumanML3D format.
Download the processed version here, and place it at ./dataset/3dpw
Download the model(s) you wish to use, then unzip and place it in ./save/
.
DoubleTake (long motions)
- my_humanml-encoder-512 (This is a reproduction of MDM best model without any changes)
- Babel_TrasnEmb_GeoLoss
ComMDM (two-person)
- pw3d_text (for text-to-motion)
- pw3d_prefix (for prefix completion)
Fine-tuned motion control - ETA May 23
DoubleTake (long motions)
Reproduce random text prompts:
python -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --num_samples 4 --handshake_size 20 --blend_len 10
Reproduce out of text file:
python -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --handshake_size 20 --blend_len 10 --input_text ./assets/dt_text_example.txt
Reproduce out of csv file (can determine each sequence length):
python -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --handshake_size 20 --blend_len 10 --input_text ./assets/dt_csv_example.csv
It will look something like this:
ComMDM (two-person)
Text-to-Motion
Reproduce paper text prompts:
python -m sample.two_person_text2motion --model_path ./save/pw3d_text/model000100000.pt --input_text ./assets/two_person_text_prompts.txt
It will look something like this:
Prefix completion
Complete unseen motion prefixes:
python -m sample.two_person_prefix_completion --model_path ./save/pw3d_prefix/model000050000.pt
It will look something like this:
Blue frames are the input prefix and orange frames are the generated completion.
Visualize dataset
Unfortunately, 3DPW dataset is not clean, even after our process. To get samples of it run:
python -m sample.two_person_text2motion --model_path ./save/humanml_trans_enc_512/model000200000.pt --sample_gt
Fine-tuned motion control - ETA May 23
You may also define:
--device
id.--seed
to sample different prompts.--motion_length
(text-to-motion only) in seconds (maximum is 9.8[sec]).
Running those will get you:
results.npy
file with text prompts and xyz positions of the generated animationsample##_rep##.mp4
- a stick figure animation for each generated motion.
To create SMPL mesh per frame run:
python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file
This script outputs:
sample##_rep##_smpl_params.npy
- SMPL parameters (thetas, root translations, vertices and faces)sample##_rep##_obj
- Mesh per frame in.obj
format.
Notes:
- The
.obj
can be integrated into Blender/Maya/3DS-MAX and rendered using them. - This script is running SMPLify and needs GPU as well (can be specified with the
--device
flag). - Important - Do not change the original
.mp4
path before running the script.
Notes for 3d makers:
- You have two ways to animate the sequence:
- Use the SMPL add-on and the theta parameters saved to
sample##_rep##_smpl_params.npy
(we always use beta=0 and the gender-neutral model). - A more straightforward way is using the mesh data itself. All meshes have the same topology (SMPL), so you just need to keyframe vertex locations.
Since the OBJs are not preserving vertices order, we also save this data to the
sample##_rep##_smpl_params.npy
file for your convenience.
- Use the SMPL add-on and the theta parameters saved to
DoubleTake (long motions)
HumanML3D best model Retraining HumanML3D is not needed as we use the original trained model from MDM. Yet, for completeness this repository supports this training as well:
python -m train.train_mdm --save_dir save/my_humanML_bestmodel --dataset humanml
Babel best model
python -m train.train_mdm --save_dir ./save/my_Babel_TrasnEmb_GeoLoss --dataset babel --latent_dim 512 --batch_size 64 --diffusion_steps 1000 --num_steps 10000000 --min_seq_len 45 --max_seq_len 250 --lambda_rcxyz 1.0 --lambda_fc 1.0 --lambda_vel 1.0
ComMDM (two-person)
Text-to-Motion
Download the pretrained model for text-to-motion training from here and place it in ./save/
. Then train with:
python -m train.train_mdm_multi --pretrained_path ./save/humanml_trans_enc_512/model000200000.pt --multi_train_mode text --multi_train_splits train,validation --save_dir ./save/my_pw3d_text
Prefix Completion
Download the pretrained model for prefix training from here and place it in ./save/
. Then train with:
python -m train.train_mdm_multi --pretrained_path ./save/humanml_trans_enc_512_prefix_finetune/model000330000.pt --multi_train_mode prefix --save_dir ./save/my_pw3d_prefix --save_interval 10000
Fine-tuned motion control - ETA May 23
- Use
--device
to define GPU id. - Add
--train_platform_type {ClearmlPlatform, TensorboardPlatform}
to track results with either ClearML or Tensorboard. - Add
--eval_during_training
to run a short evaluation for each saved checkpoint. This will slow down training but will give you better monitoring.
DoubleTake (long motions) - ETA May 23
ComMDM (two-person)
The reported evaluation for prefix completion is in ./save/pw3d_prefix/eval_prefix_pw3d_paper_results_000240000_wo_mm_1000samples.log
.
To reproduce evaluation run:
python -m eval.eval_multi --model_path ./save/pw3d_prefix/model000240000.pt
Fine-tuned motion control - ETA May 23
This code is standing on the shoulders of giants. We want to thank the following contributors that our code is based on:
MDM, guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl. TEACH.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.