My implementation of BigVGAN-base(paper) for JSUT(link) powerd by lightning.
The differences between HiFi-GAN(my implementation) and this are
- Activation is replaced by AntiAliasActivation, which is composed of 2xUpsample, Snake, 2xDownsample, instead of LeakyReLU.
- Remove pre-activation of each ConvTranspose1d w.r.t. paper.
- Segment size = 32 instead of 64 because VRAM is exhausted.
Running run.sh will automatically download the data and begin training.
So just execute the following commands to begin training.
cd scripts
./run.sh
synthesize.sh uses last.ckpt by default, so if you want to use a specific weight, change it.
cd scripts
./synthesis.sh
pip install torch torchaudio lightning pandas
Trained 1000 epochs(612000 steps) with batch_size = 16.
Pretrained model ckpt is here. https://huggingface.co/reppy4620/big_vgan_jsut/blob/main/jsut_1000.ckpt
Some audio samples are in asset/sample/
loss | plot |
---|---|
Discriminator | |
Generator | |
Feature Matching | |
Mel |