santi-pdp/segan

Issue while testing audio

HusainKapadia opened this issue · 1 comments

I have been facing a weird issue while testing. I successfully trained the SEGAN model for 19440 iterations for a batch size of 100. During training at the save_freq the max and min values of the generated sample audios are printed. Here, almost all the audio files vary from +0.55.... to -0.5....

Now, during testing for the same audio file in the training set for the same weights, the output behave like this:

test wave min:-0.42119479179382324  max:0.497093141078949
[*] Reading checkpoints...
[*] Read SEGAN-19440
[*] Load SUCCESS
Cleaning chunk 0 -> 16384
gen wave, max:  [0.96146643] min:  [-0.9862874]
inp wave, max:  0.497093141078949 min:  -0.42119479179382324
canvas w shape:  (16384, 1)
Cleaning chunk 16384 -> 32768
gen wave, max:  [0.9773201] min:  [-0.9757471]
inp wave, max:  0.3213702440261841 min:  -0.2770885229110718
canvas w shape:  (16384, 1)
Cleaning chunk 32768 -> 36480
gen wave, max:  [0.99999225] min:  [-0.9999961]
inp wave, max:  0.04255741834640503 min:  -0.041153550148010254
canvas w shape:  (16384, 1)

The generated wav sounds even noisier than before and the speech segments sound extremely loud and distorted. I have no idea why this would be happening? Need some help please.

Originally posted by @HusainKapadia in #38 (comment)

I solved the issue! I had added virtual batch normalization (VBN) for the generator as well... I removed VBN from the last layer of the generator and it worked well.