sample rate converter
dromer opened this issue · 10 comments
At the moment the processing is fixed on 44.1, but many systems run on 48k (or even higher). see #2 (comment)
It would be great if we are able to run at other samplerates, does this require training with other datasets?
Yes, that is something I'm interested in implementing. Right now I believe you would need to train using a 48k dataset (have not tested this), but I'm currently looking into using r8brain for internal samplerate conversion. The idea is if you change the samplerate in the plugin to anything other than 44.1k, it would downsample to 44.1k for processing the neural net model, then upsample back to the output samplerate. I need to verify the latency in doing that.
Hmmm, sounds to me that having a 48k trained model would certainly be the more performant option.
(considering many of us are looking at running this on embedded targets, there are more resource constraints)
I agree, no samplerate conversion would be ideal. I'll do some testing for both options and share the results here.
@GuitarML what's the decrease in performance from 44.1 to 48 on Stateful LSTM? For example does it make sense to run NeuralPi at 24kHz and put downsample/upsample blocks? Guitar is pretty dead at 12kHz...
@MaxPayne86 I wouldn't want to go below 44.1kHz, just to keep sound quality high. I'm not opposed to testing out 24kHz though, more information on how the models perform at different samplerates would be good. I haven't tested 48kHz models on the raspberry pi hardware yet, but I'll post the results here when I do. On the Rpi4, sushi is reporting 16% cpu usage running one neural net model at 44.1k.
@GuitarML I didn't mentioned aliasing, sorry. So we could low pass the input at 12kHz and we would have the network running at 48kHz for a 4x oversample does it sounds good to you? Seems lowering the sample rate to 24kHz is not a good idea...curious how commercial neural network plugins handle that...didn't read anything in available literature what do you think?
@MaxPayne86 Neural DSP uses something they call "anti-derivative trigonometric interpolation", they run neural model at a certain sample rate, but convert to the desired sample rate using the algorithm
source: https://neuraldsp.com/news/a-new-audio-engine-powering-neural-dsp-plugins
@mishushakov @GuitarML okay so the processing chain cold be something like
upsample -> (lowpass) -> neural @??? -> downsample
for the upsample/downsample blocks zita-resampler is a choice, don't know how it performs against JUCE's own implementation
NOTE: if input is 44.1, then first stage is doing 44.1 -> 48. If input is 96, then first stage is a downsample block and the second one an upsample
Nyquist alone doesn't cut it. It is a common misunderstanding. For complex signals there is a real world advantage of running 88, 96 or even 192khz sample rates natively due to microdynamics, not just oversampling for anti-aliasing. Believe it or not, things sound a lot better. (Otherwise, we would still listen to non-HD media and there would be no advantage in high sample rates at all, considering no one older than 18 might hear anything above 22kHz. Still, 48kHz is considerably higher quality than 44.1, the difference is even more noticable than the next step from 48kHz to 88. Public German broadcasting archives analog media of all kind in 192kHz in the meantime, and believe me, they would not if it would not have a benefit, just think of the cost of a multiple of space and power requirements for insane amount of data. There is some serious academical knowledge behind that. In other domains of sampling, higher samplerates than "sufficient considering Nyquist freq" are very common also for reasons.
I also would have a blast using the plugin for more delicate tasks like mic pre models for vocals. Just my ideas regarding this topic.
PS, maybe NeuralDsp uses something esoteric internally? I remember reading a paper by audio converter guru Frederic Forsell, where he argumented for 60kHz sample rate for sampling and processing audio as a theoretical reasonable optimum, when high sample rates meant much higher building cost of high quality converters...