MB-iSTFT-VITS2

A... vits2_pytorch and MB-iSTFT-VITS hybrid... Gods, an abomination! Who created this atrocity?

This is an Experimental build. Does not guarantee performance, therefore.

pre-requisites

Python >= 3.8
Pytorch version 1.13.1 (+cu1xx)
CUDA
Clone this repository
Install python requirements. Please refer requirements.txt
1. You may need to install espeak first: apt-get install espeak
Prepare datasets
1. ex) Download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: ln -s /path/to/LJSpeech-1.1/wavs DUMMY1
Build Monotonic Alignment Search and run preprocessing if you use your own datasets.

# Cython-version Monotonoic Alignment Search
cd monotonic_align
mkdir monotonic_align
python setup.py build_ext --inplace

Model	How to set up json file in configs	Sample of json file configuration
iSTFT-VITS	`"istft_vits": true,` `"upsample_rates": [8,8],`	istft_vits2_base.json
MB-iSTFT-VITS	`"subbands": 4,` `"mb_istft_vits": true,` `"upsample_rates": [4,4],`	mb_istft_vits2_base.json
MS-iSTFT-VITS	`"subbands": 4,` `"ms_istft_vits": true,` `"upsample_rates": [4,4],`	ms_istft_vits2_base.json
Mini-iSTFT-VITS	`"istft_vits": true,` `"upsample_rates": [8,8],` `"hidden_channels": 96,` `"n_layers": 3,`	mini_istft_vits2_base.json
Mini-MB-iSTFT-VITS	`"subbands": 4,` `"mb_istft_vits": true,` `"upsample_rates": [4,4],` `"hidden_channels": 96,` `"n_layers": 3,` `"upsample_initial_channel": 256,`	mini_mb_istft_vits2_base.json

python train.py -c configs/mini_mb_istft_vits2_base.json -m models/test