Normalizing features
huukim136 opened this issue ยท 5 comments
Hi @jxzhanggg ,
I tried to do feature extraction without mel normalization and then run the pre-train code. However, the result was not so high, you can listen to one of my samples below.
As you said, it is recommended to normalize the mel-spectrogram beforehand as a way to make the model converge properly. So I'm trying to do like so.
But I'm confused that how did you calculate mean and std? because each sample has different length, for example, extracted mel-spectrogram of one utterance has the length of (80, 335), another has length (80, 500), so the shapes are different?
Could you please explain how you do it? and if it is possible could you please give me that part of code?
Thank you!
Wav_275_146_ref_p293_VC.zip
Hi, you can concatenate all the sentences together first. For example, (80, 335) are concatenated with (80, 500) to get a big matrix with shape of (80, 835). Then compute the mean and std using big matrix mean = np.mean(the_big_matrix, axis=1), std = np.std(the_big_matrix, axis=1).
@jxzhanggg -- do you have code for this? It would be good to have in this repo, otherwise your results are not reproducible from this code alone
Hi, you can concatenate all the sentences together first. For example, (80, 335) are concatenated with (80, 500) to get a big matrix with shape of (80, 835). Then compute the mean and std using big matrix mean = np.mean(the_big_matrix, axis=1), std = np.std(the_big_matrix, axis=1).
Thank you for detail explanation.
Btw, I agree with @JRMeyer that this part should be included in the code to reproduce your work.
Yes, I'm a bit busy these days. You're right, this an essential step that is missed in the repo currently. I'll work on this.
I've added this part of code here
nonparaSeq2seqVC_code/pre-train/reader/extract_features.py
Lines 89 to 118 in 26cea9a