psiphi75/sonogram

Check amplitude at higher frequencies

Closed this issue · 5 comments

juho commented

As per @psiphi75 's comment raising an issue to look at the amplitude on the higher frequencies.

Here's a new test set that I generated with Audacity and amplitude at 1 (did the previous one with Live's Operator which I think induced its own filtering).

It has the following tones:
100, 200, 300, 400, 500, 600, 700, 800, 900, 1k, 2k, 3k, 4k, 5k, 6k, 7k, 8k, 9k, 10k, 11k, 12k, 13k, 14k, 15k, 16k, 17k, 18k, 19k, 20k, 21k and 22k.

  • 44.1k 16bit 100Hz-22kHz - has a ~1dB dip at 16kHz- this is how Audacity generated it at amplitude 1 so I'm unsure if it's filtering messing with it..
  • 44.1k 16bit 100Hz-22kHz with 16kHz normalized to 0dB peak - same, but with 16kHz at 0dB peak- the RMS seems to raise higher than others
  • 96k 32bit 100Hz-22kHz same, but at 96k
  • 96k 32bit 100Hz-22kHz with 16kHz normalized to 0dB peak

100Hz-22kHz 0dB.zip

So it seems there are a few issues. I'm looking at them and working from beginning to end. Essentially using the following steps:

  1. Fix the input. The input values should be in a range from -1.0 to 1.0. That's more of a documentation issue. This makes the "scale" option redundant for most use cases. If you have a 16 bit audio sample, then loading it will do the conversion. Being outside the range will of -1.0 to 1.0 will still work if a user wants that.
  2. Ensure the spectrogram is correctly calculated at all frequencies, this issue #7.
  3. Ensure the output of the spectrogram is correct.
  4. There is also lots of code tidy up to do.
juho commented

Alright! I think expecting f32 samples is pretty normal these days. I'm feeding it through the _f32 function in wasm as well. Thanks for the help.

I see the problem. The issue is that during mapping the spectrogram frequency from linear to log, it does so discretely. So the samples are badly aliased. The solution is to integrate over a given frequency range. I've refactored the code to begin this work. This will also allow for other frequency scales, not just log and linear, but Mel and others.

I'm also looking at more robust fft solutions, both for enhanced performance and accuracy. There are two candidates rustfft and microfft. microfft appears more portable, but possibly less maintained. While rustfft is more popular and faster. Both would be an option in the future and you would be able to select either using the feature flag.

This is improved, but it's not perfect. Previously a lot of information was lost, since the mapping previously only looked at one specific frequency. Now it integrates across each frequency step.

Also, it's easy now to add new frequency mappings. Currently it's still log and linear, but it would be simple to add more.

juho commented

Cool! I'll give it a whirl! Thanks for checking into it.