Unofficial Lightning implementation of VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design (Kong et al. Interspeech 2023) [Arxiv]
# clone project
git clone https://github.com/Normalist-K/lightning-vits2
cd lightning-vits2
# [OPTIONAL] create conda environment
conda create -n myenv python=3.11
conda activate myenv
# install pytorch according to instructions
# https://pytorch.org/get-started/
# install requirements
pip install -r requirements.txt
# You may need to install espeak first: apt-get install espeak
# clone project
git clone https://github.com/Normalist-K/lightning-vits2
cd lightning-vits2
# create conda environment and install dependencies
conda env create -f environment.yaml -n myenv
# activate conda environment
conda activate myenv
1. Download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: `ln -s /path/to/LJSpeech-1.1/wavs DUMMY1`
2. For mult-speaker setting, download and extract the VCTK dataset, and downsample wav files to 22050 Hz. Then rename or create a link to the dataset folder: `ln -s /path/to/VCTK-Corpus/downsampled_wavs DUMMY2`
# Cython-version Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplace
# python src/utils/preprocess.py --text_index 1 --filelists filelists/ljs_audio_text_train_filelist.txt filelists/ljs_audio_text_val_filelist.txt filelists/ljs_audio_text_test_filelist.txt
# python src/utils/preprocess.py --text_index 2 --filelists filelists/vctk_audio_sid_text_train_filelist.txt filelists/vctk_audio_sid_text_val_filelist.txt filelists/vctk_audio_sid_text_test_filelist.txt
Train model with default configuration
python src/train.py trainer=gpu model=vits2_multi data=vctk logger=tensorboard
Train model with chosen experiment configuration from configs/experiment/
python src/train.py experiment=experiment_name.yaml
Dry run for debug
python src/train.py debug=defualt data=vctk_dev
python src/train.py debug=limit # use only small portion of the data
Code follows the structure of the lightning-hydra-template and it supports a lot of amazing things. Check here, lightning-hydra-template
- Implement single speaker training code
- Implement inference code
- Updates pretrained model
This repository is highly inspired by PyTorch VITS2 repository.