earthspecies/spectral_hyperresolution

Can we reproduce the high resolution spectrogram provided by Marcello?

Closed this issue · 12 comments

aza commented

I’ve asked Marcelo for the audio file that generated the above file.

Will be very good to work towards reproducing that file.

I tried running linear_reassignment on the freesound data - with parameters where something is visible, it takes ~8 seconds to process the file. The implementation crashes on some files, haven't looked yet into why.

This is the notebook I used.

I am starting to think that maybe the hyperparameters I use are nowhere near where they should be to generate a high resolution spectogram like in the paper. Would be great to learn what values were used for noct and over (and to a lesser extent tdeci). I wonder what is the runtime of the algorithm in matlab on a GPU. There is a for loop in the center of the algorithm which is concerning.

Having an idea of the runtime in matlab both on the GPU and the CPU and of the range of values used would be super helpful.

librosa also dies on this data

aza commented

I know that Marcelo used three different wavelets and merged them to make that graphic (the red, green, and blue have three different settings... it’s where they agree (white) that the non-artifact signal is.

Sent with GitHawk

Here is an image and an audio file that it was generated from from the communcation with Marcelo.

I'll hack a notebook on going from that audio file to a 3 channel image, might be I won't complete it today.

matlab and python notebooks

Matlab (using code from Marcelo) and python results seem similar - seems that the missing ingredient might be how the results are plotted (scale of pixel intensities)

This is the latest I got from the python notebook.

download

  • there is a lot of information there
  • maybe different scale of pixel values, different color map
  • what is most concerning is the frequency axis in the spectogram in Python, if the image that was shared with us has linear y axis, this would look like it is a log scale

Here is one of the channels from the tiff image for comparison
image

The Python version took over 2 minutes to generate, was a bit worried that that could be the case, that with hyperparameters giving such results these spectograms will require a lot of compute.

In a sense these are all good results, there is a path forward and this just will require a bit more work. Seems isolating and tracking down the y axis scale issue should be the next focus point.

In the matlab file that was shared with us it says that the algorithm is

frequency-line reassignment algorithm for frequency logscale constant Q

could it be that the visualizations were created with a different version of the algorithm? Anyhow, we can go to linear scale, that shouldn't be that much of a problem

Turns out going from logscale frequency to linear is a little bit more involved than I thought. I still think it is within reach but to get it right might require quite a bit of work. I sent an email to Marcelo with the generated representation asking if he could shed some additional light on what the scale of the axis in the representation from him is. If he replies and my suspicion ends up being correct, I'll ask him if maybe he would be so kind and share the code with us.

This would be the simplest way forward on this and would give me the best chance of getting the implementation with linear frequency of the y axis right.

well, maybe this is not exactly done, but I think the intent was to verify that our implementation is correct, which I think has been achieved