Guideline on test audio files
Mauker1 opened this issue · 8 comments
Hello again!
I've successfully trained the SEGAN using the same database as used on the original paper, and I also managed to test it to enhance an audio file I created using my mic.
But, when I tried to test it on another audio file I had sitting around in my computer, I came across this error:
Loading model weights...
[*] Reading checkpoints...
[*] Read SEGAN-59750
test wave shape: (4800000,)
test wave min:1.52587890625e-05 max:0.007797360420227051
Traceback (most recent call last):
File "main.py", line 106, in <module>
tf.app.run()
File "C:\Users\mauke\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 97, in main
c_wave = se_model.clean(wave)
File "C:\Users\mauke\Documents\git\segan\model.py", line 520, in clean
x_[:len(x)] = x
ValueError: could not broadcast input array from shape (293,16384) into shape (70,16384)
It seems to me that the test audio file isn't quite what the script was expecting. But I did convert it to a .wav
file with 16kHz. So what am I missing? Are there any other requirements for the audio format?
Edit: I've used sox to downsample the audio from 44.1Khz to 16Khz. Same way that was done on the prepare_data.sh
script.
It seems that the problem is related to the audio duration.
The audio I was using is five minutes long. I've cropped it to one minute, and it worked.
Is there a duration limit?
Edit: Yeah, the problem was the duration of the audio. The clean
method couldn't handle the audio if it's longer than my batch size of 70 seconds (hence the shape).
I was using this clean method:
def clean(self, x):
""" clean a utterance x
x: numpy array containing the normalized noisy waveform
"""
# zero pad if necessary
remainder = len(x) % (2 ** 14)
if remainder != 0:
x = np.pad(x, (0, 2**14 - remainder), 'constant', constant_values=0)
# split files into equal 2 ** 14 sample chunks
x = np.array(np.array_split(x, int(len(x) / 2 ** 14)))
x_ = np.zeros((self.batch_size, 2 ** 14))
x_[:len(x)] = x
fdict = {self.gtruth_noisy[0]: x_}
output = self.sess.run(self.Gs[0], feed_dict=fdict)[:len(x)]
output = output.flatten()
# remove zero padding if added
if remainder != 0:
output = output[:-(2**14 - remainder)]
return output
Once I switched back to the old "clean" method, it worked on larger files. The only problem is that it got super slow.
Hey @Mauker1! Yes this is very slow, a dummy implementation (easiest thing that could be done with waste of resources :/) I have another version of this function, that batches many canvases in parallel (the one I used for many posterior experiments).
def clean(self, x):
""" clean a utterance x
x: numpy array containing the normalized noisy waveform
"""
c_res = None
for beg_i in range(0, x.shape[0], self.canvas_size):
if x.shape[0] - beg_i < self.canvas_size:
length = x.shape[0] - beg_i
pad = (self.canvas_size) - length
else:
length = self.canvas_size
pad = 0
x_ = np.zeros((self.batch_size, self.canvas_size))
if pad > 0:
x_[0] = np.concatenate((x[beg_i:beg_i + length], np.zeros(pad)))
else:
x_[0] = x[beg_i:beg_i + length]
print('Cleaning chunk {} -> {}'.format(beg_i, beg_i + length))
fdict = {self.gtruth_noisy[0]:x_}
canvas_w = self.sess.run(self.Gs[0],
feed_dict=fdict)[0]
canvas_w = canvas_w.reshape((self.canvas_size))
print('canvas w shape: ', canvas_w.shape)
if pad > 0:
print('Removing padding of {} samples'.format(pad))
# get rid of last padded samples
canvas_w = canvas_w[:-pad]
if c_res is None:
c_res = canvas_w
else:
c_res = np.concatenate((c_res, canvas_w))
# deemphasize
c_res = de_emph(c_res, self.preemph)
return c_res
Hi @Mauker1 please i need help with the Loading and Prediction section which is the last section.
I havent been able to figure it out.
"Then the main.py script has the option to process a wav file through the G network (inference mode), where the user MUST specify the trained weights file and the configuration of the trained network." where will the configuration be made and what precisely will i have to alter to make the system work. Thanks
I have solved that issue, but when i tried to test a sample file, my audio file was totally cleaned (couldn't hear any sound). Please what could have been the problem
i tested another sample and it worked fine, thanks... just left for me to test with my own generated wav files
What's your version of python and tensorflow?
I have been facing a weird issue while testing. I successfully trained the SEGAN model for 19440 iterations for a batch size of 100. During training at the save_freq
the max and min values of the generated sample audios are printed. Here, almost all the audio files vary from +0.55.... to -0.5....
Now, during testing for the same audio file in the training set for the same weights, the output behave like this:
test wave min:-0.42119479179382324 max:0.497093141078949
[*] Reading checkpoints...
[*] Read SEGAN-19440
[*] Load SUCCESS
Cleaning chunk 0 -> 16384
gen wave, max: [0.96146643] min: [-0.9862874]
inp wave, max: 0.497093141078949 min: -0.42119479179382324
canvas w shape: (16384, 1)
Cleaning chunk 16384 -> 32768
gen wave, max: [0.9773201] min: [-0.9757471]
inp wave, max: 0.3213702440261841 min: -0.2770885229110718
canvas w shape: (16384, 1)
Cleaning chunk 32768 -> 36480
gen wave, max: [0.99999225] min: [-0.9999961]
inp wave, max: 0.04255741834640503 min: -0.041153550148010254
canvas w shape: (16384, 1)
The generated wav sounds even noisier than before and the speech segments sound extremely loud and distorted. I have no idea why this would be happening? Need some help please.