chrisdonahue/wavegan

WaveGAN: Audio synthesis counter example where it fails for Broadband signals (Ambient noise)

hinash88 opened this issue · 3 comments

I have been trying to understand the paper on WaveGAN

1- WaveGAN: A shortcoming of WaveGAN mentioned in link that it can not work for higher frequencies. I have used underwater ship engine audio data using WaveGAN which contains high frequency components as you can see in attached FFT of real Motoboat sound (Figure 1) and it is appearing in GAN generated audio (Figure 2) as well. Can you suggest any counter example of high frequencies that WaveGAN can not deal?
Original Motorboat
WaveGAN_motorboat

2- If the high frequency components can not be dealt with WaveGAN, how can we technically explain this problem that why can't waveGAN incorporate high frequency components?

Your help will be highly valued.

To validate the high frequency components problem, I have created a simple sinusoidal dataset containing single frequency (no harmonics) from 4000 Hz to 8000 Hz. Well! again to my surprise, WaveGAN is giving good results in mimicing the original data.
So, I tried WaveGAN with another dataset (broadband signal), underwater ambient noise containing rain drops, water splashing, wave breaking sounds, It totally fails to converge and merely generates noise.
Therefore, I reached to a conclusion that WaveGAN performs good for a narrowband signal containing tonal frequencies such as (sharp sounds, drums, bgoat, bird) but doesnot perform good for broadband signals such as ambient noise.
I would want to explore the explanation that why WaveGAN (CNN based network) performs this way?

That sounds like a reasonable conclusion given your experiments. We mostly only experimented with WaveGAN on datasets with sparse, narrow-band information (e.g. bird songs).

I am not sure why it wouldn't work for wideband signals; sounds like a good research question! Maybe the discriminator has a difficult time distinguishing between the initially noisy signals produced by WaveGAN and the real data, so it's not giving good signal to the generator? Maybe the generator has the wrong inductive bias for producing wideband signals? Not clear to me :)

Thank you for your reply! ^.^