Can we reproduce the high resolution spectrogram provided by Marcello?

Question

Can we reproduce the high resolution spectrogram provided by Marcello?

Closed this issue 5 years ago · 12 comments

bs commented 5 years ago

https://www.dropbox.com/s/1473non261zf9p9/dolphin%20hyper.png?dl=0

radekosmulski commented 5 years ago

done

Answer 1 · 2019-12-06T17:05:34.000Z

I’ve asked Marcelo for the audio file that generated the above file.

Answer 2 · 2019-12-06T17:22:46.000Z

Will be very good to work towards reproducing that file.

I tried running linear_reassignment on the freesound data - with parameters where something is visible, it takes ~8 seconds to process the file. The implementation crashes on some files, haven't looked yet into why.

This is the notebook I used.

I am starting to think that maybe the hyperparameters I use are nowhere near where they should be to generate a high resolution spectogram like in the paper. Would be great to learn what values were used for noct and over (and to a lesser extent tdeci). I wonder what is the runtime of the algorithm in matlab on a GPU. There is a for loop in the center of the algorithm which is concerning.

Having an idea of the runtime in matlab both on the GPU and the CPU and of the range of values used would be super helpful.

Answer 3 · 2019-12-06T17:42:42.000Z

librosa also dies on this data

Answer 4 · 2019-12-06T19:09:41.000Z

I know that Marcelo used three different wavelets and merged them to make that graphic (the red, green, and blue have three different settings... it’s where they agree (white) that the non-artifact signal is.

_{Sent with GitHawk}

Answer 5 · 2019-12-06T21:48:29.000Z

Here is an image and an audio file that it was generated from from the communcation with Marcelo.

I'll hack a notebook on going from that audio file to a 3 channel image, might be I won't complete it today.

Answer 6 · 2019-12-06T22:27:55.000Z

matlab and python notebooks

Matlab (using code from Marcelo) and python results seem similar - seems that the missing ingredient might be how the results are plotted (scale of pixel intensities)

Answer 7 · 2019-12-06T22:49:59.000Z

This is the latest I got from the python notebook.

there is a lot of information there
maybe different scale of pixel values, different color map
what is most concerning is the frequency axis in the spectogram in Python, if the image that was shared with us has linear y axis, this would look like it is a log scale

Answer 8 · 2019-12-06T22:59:08.000Z

Here is one of the channels from the tiff image for comparison

The Python version took over 2 minutes to generate, was a bit worried that that could be the case, that with hyperparameters giving such results these spectograms will require a lot of compute.

In a sense these are all good results, there is a path forward and this just will require a bit more work. Seems isolating and tracking down the y axis scale issue should be the next focus point.

Answer 9 · 2019-12-06T23:10:14.000Z

In the matlab file that was shared with us it says that the algorithm is

frequency-line reassignment algorithm for frequency logscale constant Q

could it be that the visualizations were created with a different version of the algorithm? Anyhow, we can go to linear scale, that shouldn't be that much of a problem

Answer 10 · 2019-12-07T19:05:38.000Z

Turns out going from logscale frequency to linear is a little bit more involved than I thought. I still think it is within reach but to get it right might require quite a bit of work. I sent an email to Marcelo with the generated representation asking if he could shed some additional light on what the scale of the axis in the representation from him is. If he replies and my suspicion ends up being correct, I'll ask him if maybe he would be so kind and share the code with us.

This would be the simplest way forward on this and would give me the best chance of getting the implementation with linear frequency of the y axis right.

Answer 11 · 2019-12-19T06:06:55.000Z

well, maybe this is not exactly done, but I think the intent was to verify that our implementation is correct, which I think has been achieved