/UnsupTTS

Primary LanguageShell

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic-Speech-Recognition

UnsupTTS is an unsupervised text-to-speech (TTS) system learned from unparallel speech and text data

If you find this project useful, please consider citing our paper.

@inproceedings{Ni-etal-2022-unsup-tts,
  author={Junrui Ni and Liming Wang and Heting Gao and Kaizhi Qian and Yang Zhang and Shiyu Chang and Mark Hasegawa-Johnson},
  title={Unsupervised text-to-speech synthesis by unsupervised automatic speech recognition},
  booktitle={arKiv},
  year={2022},
  url={https://arxiv.org/pdf/2203.15796.pdf}
}

Speech demo

Speech samples can be found here

Dependencies

How to run it?

  1. Download the LJSpeech and CSS10 datasets; modify the paths and settings in source_code/unsupervised/run_css10_cpy2.slurm and tts1/css10_nl/run.sh. Current default language is Dutch (nl) with phoneme transcripts, but you can change the $lang variable to change the language and $trans_type variable to change the transcript type.

  2. Run bash run_css10_cpy2.slurm

Pretrained models

LJSpeech ASR TTS
en link link
CSS10 Unit ASR TTS
ja char link link
hu char link link
nl char link link
fi char link link
es char link link
de char link link
hu phn link link
nl phn [link] link
fi phn link link