UnsupTTS is an unsupervised text-to-speech (TTS) system learned from unparallel speech and text data
If you find this project useful, please consider citing our paper.
@inproceedings{Ni-etal-2022-unsup-tts,
author={Junrui Ni and Liming Wang and Heting Gao and Kaizhi Qian and Yang Zhang and Shiyu Chang and Mark Hasegawa-Johnson},
title={Unsupervised text-to-speech synthesis by unsupervised automatic speech recognition},
booktitle={arKiv},
year={2022},
url={https://arxiv.org/pdf/2203.15796.pdf}
}
Speech samples can be found here
- fairseq >= 1.0.0 with dependencies for wav2vec-u
- ESPnet <= 010f483e7661019761b169563ee622464125e56f
- ParallelWaveGAN
- LanguageNet G2Ps (For models using phoneme transcripts only)
-
Download the LJSpeech and CSS10 datasets; modify the paths and settings in source_code/unsupervised/run_css10_cpy2.slurm and tts1/css10_nl/run.sh. Current default language is Dutch (nl) with phoneme transcripts, but you can change the
$lang
variable to change the language and$trans_type
variable to change the transcript type. -
Run
bash run_css10_cpy2.slurm
LJSpeech | ASR | TTS |
---|---|---|
en | link | link |
CSS10 | Unit | ASR | TTS |
---|---|---|---|
ja | char | link | link |
hu | char | link | link |
nl | char | link | link |
fi | char | link | link |
es | char | link | link |
de | char | link | link |
hu | phn | link | link |
nl | phn | [link] | link |
fi | phn | link | link |