santi-pdp/segan

Can we cahnge the smapling frequency of audiowaves for training.

imran7778 opened this issue · 6 comments

Dear @santi-pdp

As this model down-sample any input audio-wave to 16K. For my case i want to train this model for dataset of waves having 100k sampling/sec. please guide me how to change the code for required problem.

My input training and testing waves are sampled at 100k. and i don't want to down-sample my dataset to 16k as this model did before training.

Please guide.

Regards
Imran Ahmed

I have the same problom too,my dataset is 8K, is it necessary to down-sample to 16K? can someone gives the detail theory?
Please guide.
qian

You should change the make_tfrecords script, which is the one making the chunks of waveforms with the assumption of 16kHz (https://github.com/santi-pdp/segan/blob/master/make_tfrecords.py#L43). I haven't ever tried frequencies above 16kHz for the chunks would get larger to have the same receptive field, but the same training scripts should work after you just change the "chunk-maker" script to generate tfrecords.

Thank you for your answer!
I will try to train the model using 8K dataset without sample rate judgement in make_tfrecords.py . The purpose of my training is to improve the perfomace of speaker recognition under reality environment, hope it works!

qian

Hi@santi-pdp
I have another question,before testing, the form of wav is :
sample rate:16000khz
precision :16 bit
sample encoding :16 bit signed integer PCM,

after test, the form is :
sample rate:16000khz
precision :25 bit
sample encoding :32 bit floating point PCM,
I wonder what caused these changes? will these changes make differences in wav?

please guide!
qian

This is because of the wavfile.write function writing the normalized [-1, 1] waveform with this encoding instead of re-scaling to 16-bit precision. You can instead use soundfile library such that it will directly write 16 bit PCM if you want, or use sox <infile.wav> -r 16k -b 16 <outfile.wav> to convert the already written wav.

Hi @santi-pdp
Thank you very much for your answer!
now I can train a 8K model at your suggestion!

I found that 8K model can be trained faster than 16K model, is it because of the canvas_size is 2**14, so I wonder can I change the canvas_size more little to train a better 8K model?

In a word, thanks a lot for your guidance!
Best wishes!
qian