Supriya090/Shruti-NepaliSpeechSynthesis

Jupyter Notebook

Shruti - Nepali Speech Synthesis

Speech Synthesis Component of Shruti - A Nepali Audiobook Platform

The text-to-speech system has two components:

Melspectrogram Generation

Finetuning Tacotron2(Shen et.al) for melspectrogram generation

Vocoder Output

Using WaveGLOW(Prenger et.al) and HifiGAN(Kong et.al) for vocoder output

Training Data

Pretrained Tacotron2 model trained on The LJSpeech Dataset(Ito and Johnson)
Finetuning Phase 1 - High quality TTS data for Nepali(Sodimana et.al)
Finetuning Phase 2 - Created own Dataset;Nepali Text-to-Speech Data (Male and Female)(Khadka et.al)

Find the output samples here and the paper here.

If you use the code or dataset, please cite our work and all the references that we have cited.

  title={Nepali Text-to-Speech Synthesis using Tacotron2 for Melspectrogram Generation},
  author={Khadka, Supriya and Ranju, GC and Paudel, Prabin and Shah, Rahul and Joshi, Basanta},
  booktitle={Proc. 2nd Annual Meeting of the ELRA/ISCA SIG on Under-resourced Languages (SIGUL 2023)},
  pages={73--77},
  year={2023}
}