/sign-segmentation

Sign Language Segmentation with Temporal Convolutional Networks (ICASSP'21) and Sign Segmentation with Changepoint-Modulated Pseudo-Labelling (CVPRW'21)

Primary LanguagePythonMIT LicenseMIT

Temporal segmentation of sign language videos

This repository provides code for following two papers:

[Project page]

demo

Contents

Setup

# Clone this repository
git clone git@github.com:RenzKa/sign-segmentation.git
cd sign-segmentation/
# Create signseg_env environment
conda env create -f environment.yml
conda activate signseg_env

Data and models

You can download our pretrained models (models.zip [302MB]) and data (data.zip [5.5GB]) used in the experiments here or by executing download/download_*.sh. The unzipped data/ and models/ folders should be located on the root directory of the repository (for using the demo downloading the models folder is sufficient).

Data:

Please cite the original datasets when using the data: BSL Corpus | Phoenix14. We provide the pre-extracted features and metadata. See here for a detailed description of the data files.

  • Features: data/features/*/*/features.mat
  • Metadata: data/info/*/info.pkl

Models:

  • I3D weights, trained for sign classification: models/i3d/*.pth.tar
  • MS-TCN weights for the demo (see tables below for links to the other models): models/ms-tcn/*.model

The folder structure should be as below:

sign-segmentation/models/
  i3d/
    i3d_kinetics_bsl1k_bslcp.pth.tar
    i3d_kinetics_bslcp.pth.tar
    i3d_kinetics_phoenix_1297.pth.tar
  ms-tcn/
    mstcn_bslcp_i3d_bslcp.model

Demo

The demo folder contains a sample script to estimate the segments of a given sign language video. It is also possible to use pre-extracted I3D features as a starting point, and only apply the MS-TCN model. --generate_vtt generates a .vtt file which can be used with our modified version of VIA annotation tool:

usage: demo.py [-h] [--starting_point {video,feature}]
               [--i3d_checkpoint_path I3D_CHECKPOINT_PATH]
               [--mstcn_checkpoint_path MSTCN_CHECKPOINT_PATH]
               [--video_path VIDEO_PATH] [--feature_path FEATURE_PATH]
               [--save_path SAVE_PATH] [--num_in_frames NUM_IN_FRAMES]
               [--stride STRIDE] [--batch_size BATCH_SIZE] [--fps FPS]
               [--num_classes NUM_CLASSES] [--slowdown_factor SLOWDOWN_FACTOR]
               [--save_features] [--save_segments] [--viz] [--generate_vtt]

Example usage:

# Print arguments
python demo/demo.py -h
# Save features and predictions and create visualization of results in full speed
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 1 --save_features --save_segments --viz
# Save only predictions and create visualization of results slowed down by factor 6
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 6 --save_segments --viz
# Create visualization of results slowed down by factor 6 and .vtt file for VIA tool
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 6 --viz --generate_vtt

The demo will:

  1. use the models/i3d/i3d_kinetics_bslcp.pth.tar pretrained I3D model to extract features,
  2. use the models/ms-tcn/mstcn_bslcp_i3d_bslcp.model pretrained MS-TCN model to predict the segments out of the features,
  3. save results (depending on which flags are used).

Training

Train ICASSP

Run the corresponding run-file (*.sh) to train the MS-TCN with pre-extracted features on BSL Corpus. During the training a .log file for tensorboard is generated. In addition the metrics get saved in train_progress.txt.

  • Influence of I3D training (fully-supervised segmentation results on BSL Corpus)
ID Model mF1B mF1S Links (for seed=0)
1 BSL Corpus 68.68±0.6 47.71±0.8 run, args, I3D model, MS-TCN model, logs
2 BSL1K -> BSL Corpus 66.17±0.5 44.44±1.0 run, args, I3D model, MS-TCN model, logs
  • Fully-supervised segmentation results on PHOENIX14
ID I3D training data MS-TCN training data mF1B mF1S Links (for seed=0)
3 BSL Corpus PHOENIX14 65.06±0.5 44.42±2.0 run, args, I3D model, MS-TCN model, logs
4 PHOENIX14 PHOENIX14 71.50±0.2 52.78±1.6 run, args, I3D model, MS-TCN model, logs

Train CVPRW

Requirement: pre-extracted pseudo-labels/ changepoints or CMPL-labels:

  1. Save pre-trained model in models/ms-tcn/*.model
  2. a) Extract pseudo-labels before extracting CMPL-labels: Extract only PL | Extract CMPL | Extract PL and CMPL b) Extract Changepoints separately for training: Extract CP -> specify correct model path
  • Pseudo-labelling techniques on PHOENIX14
ID Method Adaptation protocol mF1B mF1S Links (for seed=0)
5 Pseudo-labels inductive 47.94±1.0 32.45±0.3 run, args, I3D model, MS-TCN model, logs
6 Changepoints inductive 48.51±0.4 34.45±1.4 run, args, I3D model, MS-TCN model, logs
7 CMPL inductive 53.57±0.7 33.82±0.0 run, args, I3D model, MS-TCN model, logs
8 Pseudo-labels transductive 47.62±0.4 32.11±0.9 run, args, I3D model, MS-TCN model, logs
9 Changepoints transductive 48.29±0.1 35.31±1.4 run, args, I3D model, MS-TCN model, logs
10 CMPL transductive 53.53±0.1 32.93±0.9 run, args, I3D model, MS-TCN model, logs

Citation

If you use this code and data, please cite the following:

@inproceedings{Renz2021signsegmentation_a,
    author       = "Katrin Renz and Nicolaj C. Stache and Samuel Albanie and G{\"u}l Varol",
    title        = "Sign Language Segmentation with Temporal Convolutional Networks",
    booktitle    = "ICASSP",
    year         = "2021",
}
@inproceedings{Renz2021signsegmentation_b,
    author       = "Katrin Renz and Nicolaj C. Stache and Neil Fox and G{\"u}l Varol and Samuel Albanie",
    title        = "Sign Segmentation with Changepoint-Modulated Pseudo-Labelling",
    booktitle    = "CVPRW",
    year         = "2021",
}

License

The license in this repository only covers the code. For data.zip and models.zip we refer to the terms of conditions of original datasets.

Acknowledgements

The code builds on the github.com/yabufarha/ms-tcn repository. The demo reuses parts from github.com/gulvarol/bsl1k. We like to thank C. Camgoz for the help with the BSLCORPUS data preparation.