jxzhanggg/nonparaSeq2seqVC_code

Normalizing features

huukim136 opened this issue ยท 5 comments

Hi @jxzhanggg ,
I tried to do feature extraction without mel normalization and then run the pre-train code. However, the result was not so high, you can listen to one of my samples below.
As you said, it is recommended to normalize the mel-spectrogram beforehand as a way to make the model converge properly. So I'm trying to do like so.

But I'm confused that how did you calculate mean and std? because each sample has different length, for example, extracted mel-spectrogram of one utterance has the length of (80, 335), another has length (80, 500), so the shapes are different?
Could you please explain how you do it? and if it is possible could you please give me that part of code?
Thank you!
Wav_275_146_ref_p293_VC.zip

Hi, you can concatenate all the sentences together first. For example, (80, 335) are concatenated with (80, 500) to get a big matrix with shape of (80, 835). Then compute the mean and std using big matrix mean = np.mean(the_big_matrix, axis=1), std = np.std(the_big_matrix, axis=1).

@jxzhanggg -- do you have code for this? It would be good to have in this repo, otherwise your results are not reproducible from this code alone

Hi, you can concatenate all the sentences together first. For example, (80, 335) are concatenated with (80, 500) to get a big matrix with shape of (80, 835). Then compute the mean and std using big matrix mean = np.mean(the_big_matrix, axis=1), std = np.std(the_big_matrix, axis=1).

Thank you for detail explanation.
Btw, I agree with @JRMeyer that this part should be included in the code to reproduce your work.

Yes, I'm a bit busy these days. You're right, this an essential step that is missed in the repo currently. I'll work on this.

I've added this part of code here

def estimate_mean_std(root, num=2000):
'''
use the training data for estimating mean and standard deviation
use $num utterances to avoid out of memory
'''
specs, mels = [], []
counter_sp, counter_mel = 0, 0
for dirpath, _, filenames in os.walk(root):
for f in filenames:
if f.endswith('.spec.npy') and counter_sp<num:
path = os.path.join(dirpath, f)
specs.append(np.load(path))
counter_sp += 1
if f.endswith('.mel.npy') and counter_mel<num:
path = os.path.join(dirpath, f)
mels.append(np.load(path))
counter_mel += 1
specs = np.vstack(specs)
mels = np.vstack(mels)
mel_mean = np.mean(mels,axis=0)
mel_std = np.std(mels, axis=0)
spec_mean = np.mean(specs, axis=0)
spec_std = np.std(specs, axis=0)
np.save(os.path.join(root,"spec_mean_std.npy"),
[spec_mean, spec_std])
np.save(os.path.join(root,"mel_mean_std.npy"),
[mel_mean, mel_std])