Unofficial implementation of Megatts2
- Prepare dataset
- VQ-GAN
- ADM
- PLM
- Replace Hifigan with Bigvgan
- Mix training Chinese and English
- Train on about 1k hours of speech
- Webui
- conda create -n aligner && conda activate aligner
- conda install -c conda-forge montreal-forced-aligner=2.2.17
- Prepare wav and txt files to ./data/wav
- Run
python3 prepare_ds.py --stage 0 --num_workers 4 --wavtxt_path data/wavs --text_grid_path data/textgrids --ds_path data/ds
- mfa model download acoustic mandarin_mfa
- mfa align data/wavs utils/mandarin_pinyin_to_mfa_lty.dict mandarin_mfa data/textgrids --clean -j 12 -t /workspace/tmp
- Run
python3 prepare_ds.py --stage 1 --num_workers 4 --wavtxt_path data/wavs --text_grid_path data/textgrids --ds_path data/ds
- Run
python3 prepare_ds.py --stage 2 --generator_config configs/config_gan.yaml --generator_ckpt generator.ckpt
after training generator.
Training procedure refers to Pytorch-lightning
python infer.py
- MIT
- Support by Simon of ZideAI