When I tested the sound examples you gave, the enhanced speech were poor?
Lynlzz1314 opened this issue · 6 comments
Hi,
francoisgermain. When I run the senet_infer.py , I got enhanced speech . But I didn't have a good result
Hi,
I'm sorry to hear you're having issues. I'm afraid me and several other people were able to run it successfully before, so I have to ask you for a few more details to be able to help you. Could you run me through the operations you did on your machine to get this result? And give me some details on your configuration? Thanks!
First, my own noisy speech flies(16kHz, .wav) were stored in the folder noisy_speech. Then, i changed 'valfolder = "dataset/valset_noisy" ' to 'valfolder = "noisy_speech ' in the script senet_infer.py . Finally, I run "python senet_infer.py' . I got the folder noisy_speech_denoised. But enhanced speech after the denoisng algorithm didn't seem to work in the folder noisy_speech_denoised.
@Lynlzz1314 you probably have 16 bit audio files; you want pcm_f32le audio encoding - I don't use sox, but if you have ffmpeg installed, you can try converting your file:
ffmpeg -y -i INPUT.wav -acodec pcm_f32le -ac 1 -ar 16000 -vn OUTPUT.wav
If you do use sox, have a look at download_sedata.sh file.
@francoisgermain - it would make sense to mention that currently trained network expects 32 bit audio files in the readme - I think majority of 16khz speech corpora is on 16 bits, so there's bound to be a few people who forget to check that.
I also experienced the same issue. Converting audio files to 32-bit float is essential for getting good enhancement quality. I used sox
to do that:
sox input.wav -r 16000 -b 32 -e float output.wav
Very sorry guys. I never checked the integer data, but you're right that scipy.io.wavfile does not normalize the audio between -1.0 and +1.0. I'll add a note for now since converting to 32-bit float goes around the problem, and I'll see if I can include a fix. Thanks for the thorough investigation.
Are there any methods to use this on 48khz audio directly without having to resample down to 16khz?