Tacotron + HiFiGAN vocoder for vietnamese datasets.
Online demo at https://huggingface.co/spaces/ntt123/vietTTS.
A synthesized audio clip: clip.wav. A colab notebook: notebook.
🔔Checkout the experimental multi-speaker
branch (git checkout multi-speaker
) for multi-speaker support.🔔
git clone https://github.com/NTT123/vietTTS.git
cd vietTTS
pip3 install -e .
bash ./scripts/quick_start.sh
bash ./scripts/download_aligned_infore_dataset.sh
Note: this is a denoised and aligned version of the original dataset which is donated by the InfoRe Technology company (see here). You can download the original dataset (InfoRe Technology 1) at here.
The Montreal Forced Aligner (MFA) is used to align transcript and speech (textgrid files). Here is a Colab notebook to align InfoRe dataset. Visit MFA for more information on how to create textgrid files.
python -m vietTTS.nat.duration_trainer
python -m vietTTS.nat.acoustic_trainer
We use the original implementation from HiFiGAN authors at https://github.com/jik876/hifi-gan. Use the config file at assets/hifigan/config.json
to train your model.
git clone https://github.com/jik876/hifi-gan.git
# create dataset in hifi-gan format
ln -sf `pwd`/train_data hifi-gan/data
cd hifi-gan/data
ls -1 *.TextGrid | sed -e 's/\.TextGrid$//' > files.txt
cd ..
head -n 100 data/files.txt > val_files.txt
tail -n +101 data/files.txt > train_files.txt
rm data/files.txt
# training
python train.py \
--config ../assets/hifigan/config.json \
--input_wavs_dir=data \
--input_training_file=train_files.txt \
--input_validation_file=val_files.txt
Finetune on Ground-Truth Aligned melspectrograms:
cd /path/to/vietTTS # go to vietTTS directory
python -m vietTTS.nat.zero_silence_segments -o train_data # zero all [sil, sp, spn] segments
python -m vietTTS.nat.gta -o /path/to/hifi-gan/ft_dataset # create gta melspectrograms at hifi-gan/ft_dataset directory
# turn on finetune
cd /path/to/hifi-gan
python train.py \
--fine_tuning True \
--config ../assets/hifigan/config.json \
--input_wavs_dir=data \
--input_training_file=train_files.txt \
--input_validation_file=val_files.txt
Then, use the following command to convert pytorch model to haiku format:
cd ..
python -m vietTTS.hifigan.convert_torch_model_to_haiku \
--config-file=assets/hifigan/config.json \
--checkpoint-file=hifi-gan/cp_hifigan/g_[latest_checkpoint]
python -m vietTTS.synthesizer \
--lexicon-file=train_data/lexicon.txt \
--text="hôm qua em tới trường" \
--output=clip.wav