/Diff-TTSG

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

Primary LanguagePython

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

This is the official code repository of Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis.

Demo Page: https://shivammehta25.github.io/Diff-TTSG/

Huggingface Space: https://huggingface.co/spaces/shivammehta25/Diff-TTSG

We present Diff-TTSG, the first diffusion model that jointly learns to synthesise speech and gestures together. Our method is probabilistic and non-autoregressive, and can be trained on small datasets from scratch. In addition, to showcase the efficacy of these systems and pave the way for their evaluation, we describe a set of careful uni- and multi-modal subjective tests for evaluating integrated speech and gesture synthesis systems.

Teaser (Clik the image to be redirected to the YouTube video)

Watch the video

Installation

  1. Clone this repository

    git clone https://github.com/shivammehta25/Diff-TTSG.git
    cd Diff-TTSG
  2. Create a new environment (optional)

    conda create -n diff-ttsg python=3.10 -y
    conda activate diff-ttsg
  3. Setup diff ttsg (This will install all the dependencies and download the pretrained models)

    • Is you are using Linux or Mac OS, run the following command
    make install
    • else install all dependencies and alignment build simply by
    pip install -e .
  4. Run gradio UI

    gradio app.py

or use synthesis.ipynb

Pretrained checkpoint (Should be autodownloaded by running either make install or gradio app.py)

Citation information

If you use or build on our method or code for your research, please cite our paper:

@inproceedings{mehta2023diff,
  author={Mehta, Shivam and Wang, Siyang and Alexanderson, Simon and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
  title={{D}iff-{TTSG}: {D}enoising probabilistic integrated speech and gesture synthesis},
  year={2023},
  booktitle={Proc. ISCA Speech Synthesis Workshop (SSW)},
  pages={150--156},
  doi={10.21437/SSW.2023-24}
}

Acknowledgement

The code in the repository is heavily inspired by the source code of