Unofficial implementation of Megatts2
- Prepare dataset
- VQ-GAN
- ADM
- PLM
- python=3.10
- lightning
- lhotse
- transformers
- pypinyin
- WeTextProcessing
- phonemizer
- conda create -n aligner && conda activate aligner
- conda install -c conda-forge montreal-forced-aligner=2.2.17
- Prepare wav and txt files to ./data/wav
- Run
python3 prepare_ds.py --stage 0 --num_workers 4 --wavtxt_path data/wavs --text_grid_path data/textgrids --ds_path data/ds
- mfa model download acoustic mandarin_mfa
- mfa align data/wavs utils/mandarin_pinyin_to_mfa_lty.dict mandarin_mfa data/textgrids --clean -j 12 -t /workspace/tmp
- Run
python3 prepare_ds.py --stage 1 --num_workers 4 --wavtxt_path data/wavs --text_grid_path data/textgrids --ds_path data/ds
Training procedure refers to Pytorch-lightning
- MIT
- Support by Simon of ZideAI