This repository provides code for following two papers:
-
Katrin Renz, Nicolaj C. Stache, Samuel Albanie and Gül Varol, Sign language segmentation with temporal convolutional networks, ICASSP 2021. [arXiv]
-
Katrin Renz, Nicolaj C. Stache, Neil Fox, Gül Varol and Samuel Albanie, Sign Segmentation with Changepoint-Modulated Pseudo-Labelling, CVPRW 2021. [arXiv]
# Clone this repository
git clone git@github.com:RenzKa/sign-segmentation.git
cd sign-segmentation/
# Create signseg_env environment
conda env create -f environment.yml
conda activate signseg_env
You can download our pretrained models (models.zip [302MB]
) and data (data.zip [5.5GB]
) used in the experiments here or by executing download/download_*.sh
. The unzipped data/
and models/
folders should be located on the root directory of the repository (for using the demo downloading the models
folder is sufficient).
Please cite the original datasets when using the data: BSL Corpus | Phoenix14. We provide the pre-extracted features and metadata. See here for a detailed description of the data files.
- Features:
data/features/*/*/features.mat
- Metadata:
data/info/*/info.pkl
- I3D weights, trained for sign classification:
models/i3d/*.pth.tar
- MS-TCN weights for the demo (see tables below for links to the other models):
models/ms-tcn/*.model
The folder structure should be as below:
sign-segmentation/models/
i3d/
i3d_kinetics_bsl1k_bslcp.pth.tar
i3d_kinetics_bslcp.pth.tar
i3d_kinetics_phoenix_1297.pth.tar
ms-tcn/
mstcn_bslcp_i3d_bslcp.model
The demo folder contains a sample script to estimate the segments of a given sign language video. It is also possible to use pre-extracted I3D features as a starting point, and only apply the MS-TCN model.
--generate_vtt
generates a .vtt
file which can be used with our modified version of VIA annotation tool:
usage: demo.py [-h] [--starting_point {video,feature}]
[--i3d_checkpoint_path I3D_CHECKPOINT_PATH]
[--mstcn_checkpoint_path MSTCN_CHECKPOINT_PATH]
[--video_path VIDEO_PATH] [--feature_path FEATURE_PATH]
[--save_path SAVE_PATH] [--num_in_frames NUM_IN_FRAMES]
[--stride STRIDE] [--batch_size BATCH_SIZE] [--fps FPS]
[--num_classes NUM_CLASSES] [--slowdown_factor SLOWDOWN_FACTOR]
[--save_features] [--save_segments] [--viz] [--generate_vtt]
Example usage:
# Print arguments
python demo/demo.py -h
# Save features and predictions and create visualization of results in full speed
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 1 --save_features --save_segments --viz
# Save only predictions and create visualization of results slowed down by factor 6
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 6 --save_segments --viz
# Create visualization of results slowed down by factor 6 and .vtt file for VIA tool
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 6 --viz --generate_vtt
The demo will:
- use the
models/i3d/i3d_kinetics_bslcp.pth.tar
pretrained I3D model to extract features, - use the
models/ms-tcn/mstcn_bslcp_i3d_bslcp.model
pretrained MS-TCN model to predict the segments out of the features, - save results (depending on which flags are used).
Run the corresponding run-file (*.sh
) to train the MS-TCN with pre-extracted features on BSL Corpus.
During the training a .log
file for tensorboard is generated. In addition the metrics get saved in train_progress.txt
.
- Influence of I3D training (fully-supervised segmentation results on BSL Corpus)
ID | Model | mF1B | mF1S | Links (for seed=0) |
---|---|---|---|---|
1 | BSL Corpus | 68.68±0.6 | 47.71±0.8 | run, args, I3D model, MS-TCN model, logs |
2 | BSL1K -> BSL Corpus | 66.17±0.5 | 44.44±1.0 | run, args, I3D model, MS-TCN model, logs |
- Fully-supervised segmentation results on PHOENIX14
ID | I3D training data | MS-TCN training data | mF1B | mF1S | Links (for seed=0) |
---|---|---|---|---|---|
3 | BSL Corpus | PHOENIX14 | 65.06±0.5 | 44.42±2.0 | run, args, I3D model, MS-TCN model, logs |
4 | PHOENIX14 | PHOENIX14 | 71.50±0.2 | 52.78±1.6 | run, args, I3D model, MS-TCN model, logs |
Requirement: pre-extracted pseudo-labels/ changepoints or CMPL-labels:
- Save pre-trained model in
models/ms-tcn/*.model
- a) Extract pseudo-labels before extracting CMPL-labels: Extract only PL | Extract CMPL | Extract PL and CMPL b) Extract Changepoints separately for training: Extract CP -> specify correct model path
- Pseudo-labelling techniques on PHOENIX14
ID | Method | Adaptation protocol | mF1B | mF1S | Links (for seed=0) |
---|---|---|---|---|---|
5 | Pseudo-labels | inductive | 47.94±1.0 | 32.45±0.3 | run, args, I3D model, MS-TCN model, logs |
6 | Changepoints | inductive | 48.51±0.4 | 34.45±1.4 | run, args, I3D model, MS-TCN model, logs |
7 | CMPL | inductive | 53.57±0.7 | 33.82±0.0 | run, args, I3D model, MS-TCN model, logs |
8 | Pseudo-labels | transductive | 47.62±0.4 | 32.11±0.9 | run, args, I3D model, MS-TCN model, logs |
9 | Changepoints | transductive | 48.29±0.1 | 35.31±1.4 | run, args, I3D model, MS-TCN model, logs |
10 | CMPL | transductive | 53.53±0.1 | 32.93±0.9 | run, args, I3D model, MS-TCN model, logs |
If you use this code and data, please cite the following:
@inproceedings{Renz2021signsegmentation_a,
author = "Katrin Renz and Nicolaj C. Stache and Samuel Albanie and G{\"u}l Varol",
title = "Sign Language Segmentation with Temporal Convolutional Networks",
booktitle = "ICASSP",
year = "2021",
}
@inproceedings{Renz2021signsegmentation_b,
author = "Katrin Renz and Nicolaj C. Stache and Neil Fox and G{\"u}l Varol and Samuel Albanie",
title = "Sign Segmentation with Changepoint-Modulated Pseudo-Labelling",
booktitle = "CVPRW",
year = "2021",
}
The license in this repository only covers the code. For data.zip and models.zip we refer to the terms of conditions of original datasets.
The code builds on the github.com/yabufarha/ms-tcn repository. The demo reuses parts from github.com/gulvarol/bsl1k. We like to thank C. Camgoz for the help with the BSLCORPUS data preparation.