johnmartinsson/bird-species-classification

Signal / Noise Separation

Closed this issue · 2 comments

Add a method which separates structured sound from noise.

Input: time-frequency data
Output: Intervals with noise, intervals with structure, intervals with neither.

Implementation details:

Signal mask

  • compute the spectrogram of the whole wave file
    • pass signal through STFT, using Hanning window functino (size 512, 75% overlap).
    • normalize: divide every element by the maximum value, s.t., all values in [0, 1].
  • select all pixels in the spectrogram that are three times bigger than the row median, and three times bigger than the column media. Set these pixels to 1, all other to 0.
  • apply a binary erosion and dilation filter, 4 by 4 filter produced best results.
  • create indicator vector with as many elements as there are columns in spectrogram
    • set i:th element in indicator to 1 if i:th column contains at least one 1, otherwise set to 0
    • smooth indicator vector by applying two more binary dilation filters (4 by 1).
  • scale indicator vector to the length of the original sound file.
  • use scaled indicator vector as mask to extract signal.

Noise mask
Same as signal, but select pixels larger than 2.5 times the row/column median, and invert the vector at the end.

Everything else is considered to contain no relevant information. The use of dilation filters ensures that the number of generated intervals are kept to a minimum.

There are still problems with the scaling of the indicator vector.

The scaling problem is now resolved.