
Sign Language Segmentation with Temporal Convolutional Networks (ICASSP'21) and Sign Segmentation with Changepoint-Modulated Pseudo-Labelling (CVPRW'21)

Primary LanguagePythonMIT LicenseMIT

Temporal segmentation of sign language videos

This repository provides code for following two papers:

[Project page]




# Clone this repository
git clone git@github.com:RenzKa/sign-segmentation.git
cd sign-segmentation/
# Create signseg_env environment
conda env create -f environment.yml
conda activate signseg_env

Data and models

You can download our pretrained models (models.zip [302MB]) and data (data.zip [5.5GB]) used in the experiments here or by executing download/download_*.sh. The unzipped data/ and models/ folders should be located on the root directory of the repository (for using the demo downloading the models folder is sufficient).


Please cite the original datasets when using the data: BSL Corpus | Phoenix14. We provide the pre-extracted features and metadata. See here for a detailed description of the data files.

  • Features: data/features/*/*/features.mat
  • Metadata: data/info/*/info.pkl


  • I3D weights, trained for sign classification: models/i3d/*.pth.tar
  • MS-TCN weights for the demo (see tables below for links to the other models): models/ms-tcn/*.model

The folder structure should be as below:



The demo folder contains a sample script to estimate the segments of a given sign language video. It is also possible to use pre-extracted I3D features as a starting point, and only apply the MS-TCN model. --generate_vtt generates a .vtt file which can be used with our modified version of VIA annotation tool:

usage: demo.py [-h] [--starting_point {video,feature}]
               [--i3d_checkpoint_path I3D_CHECKPOINT_PATH]
               [--mstcn_checkpoint_path MSTCN_CHECKPOINT_PATH]
               [--video_path VIDEO_PATH] [--feature_path FEATURE_PATH]
               [--save_path SAVE_PATH] [--num_in_frames NUM_IN_FRAMES]
               [--stride STRIDE] [--batch_size BATCH_SIZE] [--fps FPS]
               [--num_classes NUM_CLASSES] [--slowdown_factor SLOWDOWN_FACTOR]
               [--save_features] [--save_segments] [--viz] [--generate_vtt]

Example usage:

# Print arguments
python demo/demo.py -h
# Save features and predictions and create visualization of results in full speed
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 1 --save_features --save_segments --viz
# Save only predictions and create visualization of results slowed down by factor 6
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 6 --save_segments --viz
# Create visualization of results slowed down by factor 6 and .vtt file for VIA tool
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 6 --viz --generate_vtt

The demo will:

  1. use the models/i3d/i3d_kinetics_bslcp.pth.tar pretrained I3D model to extract features,
  2. use the models/ms-tcn/mstcn_bslcp_i3d_bslcp.model pretrained MS-TCN model to predict the segments out of the features,
  3. save results (depending on which flags are used).



Run the corresponding run-file (*.sh) to train the MS-TCN with pre-extracted features on BSL Corpus. During the training a .log file for tensorboard is generated. In addition the metrics get saved in train_progress.txt.

  • Influence of I3D training (fully-supervised segmentation results on BSL Corpus)
ID Model mF1B mF1S Links (for seed=0)
1 BSL Corpus 68.68±0.6 47.71±0.8 run, args, I3D model, MS-TCN model, logs
2 BSL1K -> BSL Corpus 66.17±0.5 44.44±1.0 run, args, I3D model, MS-TCN model, logs
  • Fully-supervised segmentation results on PHOENIX14
ID I3D training data MS-TCN training data mF1B mF1S Links (for seed=0)
3 BSL Corpus PHOENIX14 65.06±0.5 44.42±2.0 run, args, I3D model, MS-TCN model, logs
4 PHOENIX14 PHOENIX14 71.50±0.2 52.78±1.6 run, args, I3D model, MS-TCN model, logs


Requirement: pre-extracted pseudo-labels/ changepoints or CMPL-labels:

  1. Save pre-trained model in models/ms-tcn/*.model
  2. a) Extract pseudo-labels before extracting CMPL-labels: Extract only PL | Extract CMPL | Extract PL and CMPL b) Extract Changepoints separately for training: Extract CP -> specify correct model path
  • Pseudo-labelling techniques on PHOENIX14
ID Method Adaptation protocol mF1B mF1S Links (for seed=0)
5 Pseudo-labels inductive 47.94±1.0 32.45±0.3 run, args, I3D model, MS-TCN model, logs
6 Changepoints inductive 48.51±0.4 34.45±1.4 run, args, I3D model, MS-TCN model, logs
7 CMPL inductive 53.57±0.7 33.82±0.0 run, args, I3D model, MS-TCN model, logs
8 Pseudo-labels transductive 47.62±0.4 32.11±0.9 run, args, I3D model, MS-TCN model, logs
9 Changepoints transductive 48.29±0.1 35.31±1.4 run, args, I3D model, MS-TCN model, logs
10 CMPL transductive 53.53±0.1 32.93±0.9 run, args, I3D model, MS-TCN model, logs


If you use this code and data, please cite the following:

    author       = "Katrin Renz and Nicolaj C. Stache and Samuel Albanie and G{\"u}l Varol",
    title        = "Sign Language Segmentation with Temporal Convolutional Networks",
    booktitle    = "ICASSP",
    year         = "2021",
    author       = "Katrin Renz and Nicolaj C. Stache and Neil Fox and G{\"u}l Varol and Samuel Albanie",
    title        = "Sign Segmentation with Changepoint-Modulated Pseudo-Labelling",
    booktitle    = "CVPRW",
    year         = "2021",


The license in this repository only covers the code. For data.zip and models.zip we refer to the terms of conditions of original datasets.


The code builds on the github.com/yabufarha/ms-tcn repository. The demo reuses parts from github.com/gulvarol/bsl1k. We like to thank C. Camgoz for the help with the BSLCORPUS data preparation.