This repository provides the open-source code for our InterSpeech 2024 paper SimpleSpeech and our latest paper SimpleSpeech 2.
Note that we still try to clean the repository, this is not the finnal version.
download the your dataset for training. If you dataset does not includes text. Please first refer to asr_tools to get speech-text pair.
Then please use
python scripts/get_tsv_tts.py
to get the tsv file. And then, update the path for tsv in tts.yaml
bash run.sh
bash eval.sh
If you find this code is usefule for your research. Please cite
@article{simplespeech, title={SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models}, author={Yang, Dongchao and Wang, Dingdong and Guo, Haohan and Chen, Xueyuan and Wu, Xixin and Meng, Helen}, journal={Proc. INTERSPEECH}, year={2024} }