Official PyTorch implementation of the paper:
Please visit our webpage for more details.
If you find this code useful in your research, please cite:
@inproceedings{petrovich23tmr,
title = {{TMR}: Text-to-Motion Retrieval Using Contrastive {3D} Human Motion Synthesis},
author = {Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l},
booktitle = {International Conference on Computer Vision ({ICCV})},
year = 2023
}
and if you use the re-implementation of TEMOS of this repo, please cite:
@inproceedings{petrovich22temos,
title = {{TEMOS}: Generating diverse human motions from textual descriptions},
author = {Petrovich, Mathis and Black, Michael J. and Varol, G{\"u}l},
booktitle = {European Conference on Computer Vision ({ECCV})},
year = 2022
}
You can also put a star ⭐, if the code is useful to you.
Create environment
Create a python virtual environnement:
python -m venv ~/.venv/TMR
source ~/.venv/TMR/bin/activate
Install PyTorch
python -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
Then install remaining packages:
python -m pip install -r requirements.txt
which corresponds to the packages: pytorch_lightning, einops, hydra-core, hydra-colorlog, orjson, tqdm, scipy. The code was tested on Python 3.10.12 and PyTorch 2.0.1.
Set up the datasets
The process is a little bit different than other repos because we need to have a common reprensenation for HumanML3D, KITML and BABEL (to be able to train on one, and evaluate on another). If you are currious about the details, I recommand you to read this file: DATASETS.md. I also put the bibtex files of the datasets, which I recommand you to cite.
Please follow the instructions of the raw_pose_processing.ipynb
of the HumanML3D repo, to get the pose_data
folder.
Then copy or symlink the pose_data folder in datasets/motions/
:
ln -s /path/to/HumanML3D/pose_data datasets/motions/pose_data
Run the following command, to compute the HumanML3D Guo features on the whole AMASS (+HumanAct12) dataset.
python -m prepare.compute_guoh3dfeats
It should process the features (+ mirrored version) and saved them in datasets/motions/guoh3dfeats
.
Run this command to compute the sentence embeddings and token embeddings used in TMR for each datasets.
python -m prepare.text_embeddings data=humanml3d
This will save:
- the token embeddings of
distilbert
indatasets/annotations/humanml3d/token_embeddings
- the sentence embeddings of
all-mpnet-base-v2
indatasets/annotations/humanml3d/sent_embeddings
To get statistics of the motion distribution for each datasets, you can run the following commands. It is already included in the repo, so you don't have to. The statistics are computed on the training set.
python -m prepare.motion_stats data=humanml3d
It will save the statistics (mean.pt
and std.pt
) in this folder stats/humanml3d/guoh3dfeats
. You can replace data=humanml3d
with data=kitml
or data=babel
anywhere in this repo.
python train.py [OPTIONS]
Details
By default, it will train TMR on HumanML3D and store the folder in outputs/tmr_humanml3d_guoh3dfeats
which I will call RUN_DIR
.
The other options are:
model=tmr
: TMR (by default)model=temos
: TEMOS
data=humanml3d
: HumanML3D (by default)data=kitml
: KIT-MLdata=babel
: BABEL
Extracting weights
After training, run the following command, to extract the weights from the checkpoint:
python extract.py run_dir=RUN_DIR
It will take the last checkpoint by default. This should create the folder RUN_DIR/last_weights
and populate it with the files: motion_decoder.pt
, motion_encoder.pt
and text_encoder.pt
.
This process makes loading models faster, it does not depends on the file structure anymore, and each module can be loaded independently. This is already done for pretrained models.
bash prepare/download_pretrain_models.sh
This will put pretrained models in the models
folder.
Currently, there are:
- TMR trained on HumanML3D with Guo et al. humanml3d features
models/tmr_humanml3d_guoh3dfeats
- TMR trained on KIT-ML with Guo et al. humanml3d features
models/tmr_kitml_guoh3dfeats
Not that KIT-ML is used with the Guo et al. humanml3d
features (it is not a mistake). The motions come from AMASS and are converted (I am not using the MMM joints from the original KIT-ML).
This makes the two models works in the same motion space.
More models may be available later on.
python retrieval.py run_dir=RUN_DIR
It will compute the metrics, show them and save them in this folder RUN_DIR/contrastive_metrics/
.
Note that the .npy file should corresponds to HumanML3D Guo features.
python encode_motion.py run_dir=RUN_DIR npy=/path/to/motion.npy
python encode_text.py run_dir=RUN_DIR text="A person is walking forward."
python text_motion_sim.py run_dir=RUN_DIR text=TEXT npy=/path/to/motion.npy
For example with text="a man sets to do a backflips then fails back flip and falls to the ground"
and npy=HumanML3D/HumanML3D/new_joint_vecs/001034.npy
you should get around 0.96.
python encode_dataset.py run_dir=RUN_DIR
Run this command:
python app.py
and then open your web browser at the address: http://localhost:7860
.
The code will be available a bit later.
Details and difference
TEMOS code was probably a bit too abstract and some users struggle to understand it. As TMR and TEMOS share a similar architecture, I took the opportunity to rewrite TEMOS in this repo src/model/temos.py to make it more user friendly. Note that in this repo, the motion representation is different from the original TEMOS paper (see DATASETS.md for more details). Another difference is that I precompute the token embeddings (from distilbert) beforehand (as I am not finetunning the distilbert for the final model). This makes the training around x2 faster and it is more memory efficient.
The code and the generations are not fully tested yet, I will update the README with pretrained models and more information later.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including PyTorch, PyTorch3D, Hugging Face, Hydra, and uses datasets which each have their own respective licenses that must also be followed.