Motion In-betweening via Two-stage Transformers


Jia Qin, Youyi Zheng, and Kun Zhou. 2022. Motion In-betweening via Two-stage Transformers. ACM Trans. Graph. 41, 6, Article 184 (December 2022), 16 pages.

Getting Started

  1. Download LAFAN1 dataset.

  2. Extract to datasets folder. Bvh files should be located in motion_inbetweening/datasets/lafan1 folder.

  3. Download the pre-trained models from the Releases Page. Extract it to the motion_inbetweening/experiments folder.

  4. Install PyTorch. The code has been tested in Python3.8, PyTorch-1.8.2.

Run Baseline Benchmark

Under scripts folder, run python lafan1_context_model

This will give you the same baseline results shown in Robust Motion In-betweening (Harvey et al., 2020) paper. If the LAFAN1 dataset has been properly set up, you are expected to see the following results:

trans:  5
zerov_pos: 1.5231, zerov_quat: 0.56, zerov_npss: 0.0053
inter_pos: 0.3729, inter_quat: 0.22, inter_npss: 0.0023
trans: 15
zerov_pos: 3.6946, zerov_quat: 1.10, zerov_npss: 0.0522
inter_pos: 1.2489, inter_quat: 0.62, inter_npss: 0.0391
trans: 30
zerov_pos: 6.6005, zerov_quat: 1.51, zerov_npss: 0.2318
inter_pos: 2.3159, inter_quat: 0.98, inter_npss: 0.2013
trans: 45
zerov_pos: 9.3293, zerov_quat: 1.81, zerov_npss: 0.4918
inter_pos: 3.4471, inter_quat: 1.25, inter_npss: 0.4493

Generate Transition

To use the full method (Detail + Context Transformer) to generate in-betweening, run


usage: [-h] [-s DATASET] [-i INDEX] [-t TRANS] [-d] [-p] det_config ctx_config

Evaluate detail model. No post-processing applied by default.

positional arguments:
  det_config            detail config name
  ctx_config            context config name

optional arguments:
  -h, --help            show this help message and exit
  -s DATASET, --dataset DATASET
                        dataset name (default=benchmark)
  -i INDEX, --index INDEX
                        data index
  -t TRANS, --trans TRANS
                        transition length (default=30)
  -d, --debug           debug mode
  -p, --post_processing
                        apply post-processing


  1. Get benchmark stats on LAFAN1 dataset with transition=5 frames:

    python lafan1_detail_model lafan1_context_model -t 5

    You are expected to see the same stats shown in our paper:

    trans 5: gpos: 0.1049, gquat: 0.0994, npss: 0.0011

    Try other transition lengths and you should get:

    trans 15: gpos: 0.3943, gquat: 0.2839, npss: 0.0188
    trans 30: gpos: 0.8948, gquat: 0.5446, npss: 0.1124
    trans 45: gpos: 1.6777, gquat: 0.8727, npss: 0.3217
  2. Generate 30 transition frames based on the clip with index=100 in LAFAN1 benchmark dataset:

    python lafan1_detail_model lafan1_context_model -t 30 -i 100

    You should get the generated transition and the corresponding ground truth in JSON format under the scripts folder:


Generate Transition by Context Transformer Only

If you prefer to use only Context Transformer, run Its usage is very similar to Run python -h to see its usage info.


  1. Get benchmark stats on LAFAN1 dataset with transition=5 frames.

    Context Transformer only, WITHOUT post-processing:

    python lafan1_context_model -t 5
    trans 5: gpos: 0.1717, gquat: 0.1325, npss: 0.0015

    Results of other transition lengths:

    trans 15: gpos: 0.4923, gquat: 0.3287, npss: 0.0212
    trans 30: gpos: 1.0663, gquat: 0.5991, npss: 0.1238
    trans 45: gpos: 1.9972, gquat: 0.9170, npss: 0.3369

    Context Transformer only, WITH post-processing:

    python lafan1_context_model_constraints -t 5 -p
    trans 5: gpos: 0.1288, gquat: 0.1143, npss: 0.0015 (w/ post-processing)

    Results of other transition lengths:

    trans 15: gpos: 0.4623, gquat: 0.3154, npss: 0.0211 (w/ post-processing)
    trans 30: gpos: 1.0354, gquat: 0.5898, npss: 0.1210 (w/ post-processing)
    trans 45: gpos: 1.9439, gquat: 0.9114, npss: 0.3349 (w/ post-processing)
  2. Generate 30 transition frames based on the clip with index=100 in LAFAN1 benchmark dataset with post-processing:

    python lafan1_context_model_constraints -t 30 -i 100 -p

    You should get the predicted transition and the ground truth in JSON format under the scripts folder:


Visualize Output Motion in Autodesk Maya

Use the visualize function in motion_inbetween.visualization.maya module to load motions in JSON format:


Training From Scratch

If you want to train the models by yourself, install visdom to visualize training statistics.

pip install visdom

Launch visdom local server before training starts:

$ visdom
Checking for scripts.
It's Alive!

First train the Context Transformer by running

usage: [-h] config

Train context model.

positional arguments:
config      config name

optional arguments:
-h, --help  show this help message and exit


python lafan1_context_model

Then train Detail Transformer by running

usage: [-h] det_config ctx_config

Train detail model.

positional arguments:
det_config  detail config name
ctx_config  context config name

optional arguments:
-h, --help  show this help message and exit


python lafan1_detail_model lafan1_context_model