/priorMDM

The official implementation of the paper "Human Motion Diffusion as a Generative Prior"

Primary LanguagePythonMIT LicenseMIT

PriorMDM: Human Motion Diffusion as a Generative Prior

arXiv

The official PyTorch implementation of the paper "Human Motion Diffusion as a Generative Prior"(ArXiv).

Please visit our webpage for more details.

teaser

Release status

Training Generation Evaluation
DoubleTake (long motion) ETA May 23
ComMDM (two-person)
Fine-tuned motion control ETA May 23 ETA May 23 ETA May 23

News

📢 14/Apr/2023 - First release - DoubleTake/ComMDM - Training and generation with pre-trained models is available.

Getting started

This code was tested on Ubuntu 18.04.5 LTS and requires:

  • Python 3.8
  • conda3 or miniconda3
  • CUDA capable GPU (one is enough)

1. Setup environment

Install ffmpeg (if not already installed):

sudo apt update
sudo apt install ffmpeg

For windows use this instead.

Setup conda env:

conda env create -f environment.yml
conda activate PriorMDM
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/GuyTevet/smplx.git

2. Get MDM dependencies

PriorMDM share most of its dependencies with the original MDM. If you already have an installed MDM from the official repo, you can save time and link the dependencies instead of getting them from scratch.

If you already have an installed MDM

Link from installed MDM

Before running the following bash script, first change the path to the full path to your installed MDM

bash prepare/link_mdm.sh
First time user

Download dependencies:

bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh

Get HumanML3D dataset (For all applications):

Follow the instructions in HumanML3D, then copy the result dataset to our repository:

cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D

3. Get PriorMDM dependencies

DoubleTake (Long sequences)

BABEL dataset

Download the processed version here, and place it at ./dataset/babel

SMPLH dependencies

Download here, and place it at ./bodymodels

ComMDM (two-person)

3DPW dataset

For ComMDM, we cleaned 3DPW and converted it to HumanML3D format.

Download the processed version here, and place it at ./dataset/3dpw

4. Download the pretrained models

Download the model(s) you wish to use, then unzip and place it in ./save/.

DoubleTake (long motions)
ComMDM (two-person)

Fine-tuned motion control - ETA May 23

Motion Synthesis

DoubleTake (long motions)

Reproduce random text prompts:

python -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --num_samples 4 --handshake_size 20 --blend_len 10

Reproduce out of text file:

python -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --handshake_size 20 --blend_len 10 --input_text ./assets/dt_text_example.txt 

Reproduce out of csv file (can determine each sequence length):

python -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --handshake_size 20 --blend_len 10 --input_text ./assets/dt_csv_example.csv 

It will look something like this:

example

ComMDM (two-person)

Text-to-Motion

Reproduce paper text prompts:

python -m sample.two_person_text2motion --model_path ./save/pw3d_text/model000100000.pt --input_text ./assets/two_person_text_prompts.txt

It will look something like this:

example

Prefix completion

Complete unseen motion prefixes:

python -m sample.two_person_prefix_completion --model_path ./save/pw3d_prefix/model000050000.pt

It will look something like this:

example

Blue frames are the input prefix and orange frames are the generated completion.

Visualize dataset

Unfortunately, 3DPW dataset is not clean, even after our process. To get samples of it run:

python -m sample.two_person_text2motion --model_path ./save/humanml_trans_enc_512/model000200000.pt --sample_gt

Fine-tuned motion control - ETA May 23

You may also define:

  • --device id.
  • --seed to sample different prompts.
  • --motion_length (text-to-motion only) in seconds (maximum is 9.8[sec]).

Running those will get you:

  • results.npy file with text prompts and xyz positions of the generated animation
  • sample##_rep##.mp4 - a stick figure animation for each generated motion.

Render SMPL mesh

To create SMPL mesh per frame run:

python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file

This script outputs:

  • sample##_rep##_smpl_params.npy - SMPL parameters (thetas, root translations, vertices and faces)
  • sample##_rep##_obj - Mesh per frame in .obj format.

Notes:

  • The .obj can be integrated into Blender/Maya/3DS-MAX and rendered using them.
  • This script is running SMPLify and needs GPU as well (can be specified with the --device flag).
  • Important - Do not change the original .mp4 path before running the script.

Notes for 3d makers:

  • You have two ways to animate the sequence:
    1. Use the SMPL add-on and the theta parameters saved to sample##_rep##_smpl_params.npy (we always use beta=0 and the gender-neutral model).
    2. A more straightforward way is using the mesh data itself. All meshes have the same topology (SMPL), so you just need to keyframe vertex locations. Since the OBJs are not preserving vertices order, we also save this data to the sample##_rep##_smpl_params.npy file for your convenience.

Train your own PriorMDM

DoubleTake (long motions)

HumanML3D best model Retraining HumanML3D is not needed as we use the original trained model from MDM. Yet, for completeness this repository supports this training as well:

python -m train.train_mdm --save_dir save/my_humanML_bestmodel --dataset humanml 

Babel best model

python -m train.train_mdm --save_dir ./save/my_Babel_TrasnEmb_GeoLoss --dataset babel --latent_dim 512 --batch_size 64 --diffusion_steps 1000 --num_steps 10000000 --min_seq_len 45 --max_seq_len 250 --lambda_rcxyz 1.0 --lambda_fc 1.0 --lambda_vel 1.0
ComMDM (two-person)

Text-to-Motion

Download the pretrained model for text-to-motion training from here and place it in ./save/. Then train with:

python -m train.train_mdm_multi --pretrained_path ./save/humanml_trans_enc_512/model000200000.pt --multi_train_mode text --multi_train_splits train,validation --save_dir ./save/my_pw3d_text

Prefix Completion

Download the pretrained model for prefix training from here and place it in ./save/. Then train with:

python -m train.train_mdm_multi --pretrained_path ./save/humanml_trans_enc_512_prefix_finetune/model000330000.pt --multi_train_mode prefix --save_dir ./save/my_pw3d_prefix --save_interval 10000

Fine-tuned motion control - ETA May 23

  • Use --device to define GPU id.
  • Add --train_platform_type {ClearmlPlatform, TensorboardPlatform} to track results with either ClearML or Tensorboard.
  • Add --eval_during_training to run a short evaluation for each saved checkpoint. This will slow down training but will give you better monitoring.

Evaluate

DoubleTake (long motions) - ETA May 23

ComMDM (two-person)

The reported evaluation for prefix completion is in ./save/pw3d_prefix/eval_prefix_pw3d_paper_results_000240000_wo_mm_1000samples.log.

To reproduce evaluation run:

python -m eval.eval_multi --model_path ./save/pw3d_prefix/model000240000.pt

Fine-tuned motion control - ETA May 23

Acknowledgments

This code is standing on the shoulders of giants. We want to thank the following contributors that our code is based on:

MDM, guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl. TEACH.

License

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.