When I tested the sound examples you gave, the enhanced speech were poor?

Question

When I tested the sound examples you gave, the enhanced speech were poor?

Lynlzz1314 opened this issue 6 years ago · 6 comments

Hi,
francoisgermain. When I run the senet_infer.py , I got enhanced speech . But I didn't have a good result

Answer 1 · 2018-11-09T20:04:47.000Z

Hi,
I'm sorry to hear you're having issues. I'm afraid me and several other people were able to run it successfully before, so I have to ask you for a few more details to be able to help you. Could you run me through the operations you did on your machine to get this result? And give me some details on your configuration? Thanks!

Answer 2 · 2018-11-11T04:34:09.000Z

First, my own noisy speech flies(16kHz, .wav) were stored in the folder noisy_speech. Then, i changed 'valfolder = "dataset/valset_noisy" ' to 'valfolder = "noisy_speech ' in the script senet_infer.py . Finally, I run "python senet_infer.py' . I got the folder noisy_speech_denoised. But enhanced speech after the denoisng algorithm didn't seem to work in the folder noisy_speech_denoised.

Answer 3 · 2019-01-03T00:02:49.000Z

@Lynlzz1314 you probably have 16 bit audio files; you want pcm_f32le audio encoding - I don't use sox, but if you have ffmpeg installed, you can try converting your file:

ffmpeg -y -i INPUT.wav -acodec pcm_f32le -ac 1 -ar 16000 -vn OUTPUT.wav

If you do use sox, have a look at download_sedata.sh file.

@francoisgermain - it would make sense to mention that currently trained network expects 32 bit audio files in the readme - I think majority of 16khz speech corpora is on 16 bits, so there's bound to be a few people who forget to check that.

Answer 4 · 2019-03-04T17:22:11.000Z

I also experienced the same issue. Converting audio files to 32-bit float is essential for getting good enhancement quality. I used sox to do that:
sox input.wav -r 16000 -b 32 -e float output.wav

Answer 5 · 2019-03-23T23:33:15.000Z

Very sorry guys. I never checked the integer data, but you're right that scipy.io.wavfile does not normalize the audio between -1.0 and +1.0. I'll add a note for now since converting to 32-bit float goes around the problem, and I'll see if I can include a fix. Thanks for the thorough investigation.

Answer 6 · 2020-09-28T05:21:08.000Z

Are there any methods to use this on 48khz audio directly without having to resample down to 16khz?