
can you suggest settings for 16000hz sample rate and n_fft 1024 training

bharaniyv opened this issue · 7 comments

I want to retrain the model with 16khz sample rate with n_fft 1024 n_shift 256 to use the output model with standard vocoders but it is not working can you suggest the changes required for 16000hz training with n_fft 1024, n_shift 256, window_length 1024 training from scratch.



Do you use the same settings for VC model and vocoder?
The settings may contain n_fft, n_shift, and log transformation.
For the ParallelWaveGAN, we also use log-melspectrogram.


Hi I want to use hi-fi gan pre-released version instead of PWG it has n_fft 1024 and n_shift 256 so I wanted to retrain TriaanVC with those option but facing errors, can you suggest any changes to make it work with those parameters?

The error is about training? or the performance after retraining?
Actually, I'm also in the progress to train TriAAN-VC on Libri-TTS which is compatible with HiFiGAN.
Since the hifigan takes mel-spectrogram as inputs, the preprocessing step should be changed (This repo provides the steps for log mel-spectrogram). You may refer to TacotronSTFT functions for the steps.

the error is about size mismatch between the dimensions of CPC input and mel input to the encoder decoder model

It can be different since the pre-trained CPC extractor processes 10 ms.
So if you change the configuration, the CPC extractor should be re-trained again.
It may be better to use mel-spectrogram versions instead of the CPC version if training the CPC extractor is costly.

Thanks for the clarification but mel-spectrogram version is not as good as CPC version right, have you tried other options like WavLM or Wav2vec do you think any of those will work?

I have tried wav2vec, but it did not contribute to significant improvement as far as I remembered.