/spoken-to-signed-translation

a text-to-gloss-to-pose-to-video pipeline for spoken to signed language translation

Primary LanguagePythonMIT LicenseMIT

Gloss-Based Pipeline for Spoken to Signed Language Translation

a text-to-gloss-to-pose-to-video pipeline for spoken to signed language translation.

Visualization of our pipeline

Install

pip install git+https://github.com/ZurichNLP/spoken-to-signed-translation.git

Then, to download a lexicon, run:

download_lexicon \
  --name <signsuisse> \
  --directory <path_to_directory>

Usage

For language codes, we use the IANA Language Subtag Registry. Our pipeline provides multiple scripts.

To quickly demo it using a dummy lexicon, run:

Open In Colab
git clone https://github.com/ZurichNLP/spoken-to-signed-translation
cd spoken-to-signed-translation

text_to_gloss_to_pose \
  --text "Kleine Kinder essen Pizza." \
  --glosser "simple" \
  --lexicon "assets/dummy_lexicon" \
  --spoken-language "de" \
  --signed-language "sgg" \
  --pose "quick_test.pose"

Text-to-Gloss Translation

This script translates input text into gloss notation.

text_to_gloss \
  --text <input_text> \
  --glosser <simple|spacylemma|rules|nmt> \
  --spoken-language <de|fr|it> \
  --signed-language <sgg|ssr|slf>

Pose-to-Video Conversion

This script converts a pose file into a video file.

pose_to_video \
  --pose <pose_file_path>.pose \
  --video <output_video_file_path>.mp4

Text-to-Gloss-to-Pose Translation

This script translates input text into gloss notation, then converts the glosses into a pose file.

text_to_gloss_to_pose \
  --text <input_text> \
  --glosser <simple|spacylemma|rules|nmt> \
  --lexicon <path_to_directory> \
  --spoken-language <de|fr|it> \
  --signed-language <sgg|ssr|slf> \
  --pose <output_pose_file_path>.pose

Text-to-Gloss-to-Pose-to-Video Translation

This script translates input text into gloss notation, converts the glosses into a pose file, and then transforms the pose file into a video.

text_to_gloss_to_pose_to_video \
  --text <input_text> \
  --glosser <simple|spacylemma|rules|nmt> \
  --lexicon <path_to_directory> \
  --spoken-language <de|fr|it> \
  --signed-language <sgg|ssr|slf> \
  --video <output_video_file_path>.mp4

Methodology

The pipeline consists of three main components:

  1. Text-to-Gloss Translation: Transforms the input (spoken language) text into a sequence of glosses.
  1. Gloss-to-Pose Conversion:
  • Lookup: Uses a lexicon of signed languages to convert the sequence of glosses into a sequence of poses.
  • Pose Concatenation: The poses are then cropped, concatenated, and smoothed, creating a pose representation for the input sentence.
  1. Pose-to-Video Generation: Transforms the processed pose video back into a synthesized video using an image translation model.

Supported Languages

Language IANA Code Glossers Supported Lexicon Data Source
Swiss German Sign Language sgg simple, spacylemma, rules, nmt SignSuisse (de)
Swiss French Sign Language ssr simple, spacylemma SignSuisse (fr)
Swiss Italian Sign Language slf simple, spacylemma SignSuisse (it)
German Sign Language gsg simple, spacylemma, nmt WordNet (Coming Soon)
British Sign Language bfi simple, spacylemma, nmt WordNet (Coming Soon)

Citation

If you find this work useful, please cite our paper:

@inproceedings{moryossef2023baseline,
  title={An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation},
  author={Moryossef, Amit and M{\"u}ller, Mathias and G{\"o}hring, Anne and Jiang, Zifan and Goldberg, Yoav and Ebling, Sarah},
  booktitle={2nd International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)},
  year={2023},
  month={June},
  url={https://github.com/ZurichNLP/spoken-to-signed-translation},
  note={Available at: \url{https://arxiv.org/abs/2305.17714}}
}