xiph/rnnoise

5000000*87, what does (87-42) mean?

maggie0830 opened this issue · 3 comments

As we know, we use the denoise.c get 42 features, but when we run "denoise_training speech.pcm noise.pcm " we get 5000000*87 feature matrix. And including the 42 feature infeature matrix, what dose the (87-42) mean? thanks

42 features extraction + 22 expected gain +22 noise logarithmic spectrum+1 vad=87, so that we get 5000000*87 matrix.

There isn't much I can find on training. I want to train for 8K narrow band and interested to train on MS-SNSD dataset combine with various noise dataset shared on demo.

Can you help with following questions?

  1. Do I need to downsample all dataset to 8KHz or I just need to downsample dataset to match with 16KHz samples in MS-SNSD?
  2. signal.raw mentioned in TRAINING-README should contain noise? or I need to use clean audio files provided MS-SNSD clean_train folder as is?
  3. denoise_training takes only one signal.raw and noise.raw. so how can I run it on multiple files since it overrides every time training.f32? do I need to combine all clean audio files in one and noises in another audio file?

Hi, what is the function of denoise_training file? I can't seem to open it to check out its function. Please, thanks.