kan-bayashi/ParallelWaveGAN

How would you train for BW extension?

Opened this issue · 2 comments

I'm interested in training to convert 24 kHz mel spectrograms to 48 kHz waveforms (like HIFI-GAN2). Might not work without changing the architecture, but that's ok. How would you modify the config files to do this? I've already run the recipe through stage 1 to extract features with downsampled VCTK. Now I'm hesitating on how to modify the generator parameters to produce 2x length waveform with the HIFI gan config

You can simply increase upsample scale here.

upsample_scales: [8, 8, 2, 2] # Upsampling scales.
upsample_kernel_sizes: [16, 16, 4, 4] # Kernel size for upsampling layers.

E.g.,

 upsample_scales: [8, 8, 4, 2]         # Upsampling scales. 
 upsample_kernel_sizes: [16, 16, 8, 4] # Kernel size for upsampling layers.