This project is for training tortoise-tts like model. Most of the codes are from tortoise tts and xtts.
Now only support mandarin.
pip install . -e
Training the model including many steps.
Use the ttts/prepare/bpe_all_text_to_one_file.py
to merge all text you have collected. To train the tokenizer, check the ttts/gpt/voice_tokenizer
for more info.
Use the vad_asr_save_to_jsonl.py
and save_mel_to_disk.py
to preprocess dataset.
Use the following instruction to train the model.
accelerate launch ttts/vqvae/train.py
Use save_mel_vq_to_disk.py
to preprocess mel vq. Run
accelerate launch ttts/gpt/train.py
to train the model.
You can choose any one vocoder of the following. You must choose one, due to the output reconstructed by the vqvae has low sound quality.
WIP
I chose the pretrained vocos as the vocoder for this project for no special reson. You can swap to any other ones like univnet.
After change the config.json properly
accelerate launch ttts/diffusion/train.py