MB-iSTFT-VITS2

Alt text

A... vits2_pytorch and MB-iSTFT-VITS hybrid... Gods, an abomination! Who created this atrocity?

This is an Experimental build. Does not guarantee performance, therefore.

pre-requisites

  1. Python >= 3.8
  2. Pytorch version 1.13.1 (+cu1xx)
  3. CUDA
  4. Clone this repository
  5. Install python requirements. Please refer requirements.txt
    1. You may need to install espeak first: apt-get install espeak
  6. Prepare datasets
    1. ex) Download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: ln -s /path/to/LJSpeech-1.1/wavs DUMMY1
  7. Build Monotonic Alignment Search and run preprocessing if you use your own datasets.
# Cython-version Monotonoic Alignment Search
cd monotonic_align
mkdir monotonic_align
python setup.py build_ext --inplace

Setting json file in configs

Model How to set up json file in configs Sample of json file configuration
iSTFT-VITS "istft_vits": true,
"upsample_rates": [8,8],
istft_vits2_base.json
MB-iSTFT-VITS "subbands": 4,
"mb_istft_vits": true,
"upsample_rates": [4,4],
mb_istft_vits2_base.json
MS-iSTFT-VITS "subbands": 4,
"ms_istft_vits": true,
"upsample_rates": [4,4],
ms_istft_vits2_base.json
Mini-iSTFT-VITS "istft_vits": true,
"upsample_rates": [8,8],
"hidden_channels": 96,
"n_layers": 3,
mini_istft_vits2_base.json
Mini-MB-iSTFT-VITS "subbands": 4,
"mb_istft_vits": true,
"upsample_rates": [4,4],
"hidden_channels": 96,
"n_layers": 3,
"upsample_initial_channel": 256,
mini_mb_istft_vits2_base.json

Training Example

python train.py -c configs/mini_mb_istft_vits2_base.json -m models/test

Credits