grrrr/nsgt

Real-time streamed SliCQ

Closed this issue ยท 6 comments

Hi, thanks for this amazing piece of software and the research!

I'm curious to whether it is possible for sliCQ to be used for streamed, real-time samples (e.g. from a microphone). It would be great if it is (and there is an example). I'm curious about the setup that should be used in a streamed scenario

Thanks in advance!

grrrr commented

Hi Thomas,

Thanks a lot for the reply. I've looked at the example but was misled as it instead just processed a whole file--thanks for clarifying!

I'll try out certain configurations--but I take it there are no experiments done so far with live microphone streamed audio to show some parameters that work well w.r.t. load, latency and other issues? In particular I'm not sure what microphone block length, slice and transition length to use. I'm aiming to minimise latency given octave scale, fmin = 130.8 (C3), fmax = 22050, bins/octave = 12.

Thanks and regards,
Lin

grrrr commented

Hi!

Thanks for the quick reply!

I'm not sure what the implication of the calculation is--tracing back:

(4096/44100) / (1/130.8) = 12.15 
floor(12.15) = 12
(4096/44100) / (1/f) = 12 => f = 129.2 Hz

What implication does this number have? I understand that CQT is still subject to the time-freq uncertainty principle and if the slice length is too short, the low frequencies would have poor freq. res..

Further, the paper states that the slices overlap--how does this affect real-time slice-wise processing? I assume the transition length is the extent of the overlapping between the windows--if the microphone sends in one slice at a time, should the transition length be 0? or should the microphone send in a block of a few slices and use non-zero transition length (which will cause further issues between blocks I presume).

Thanks again,
Lin

Here's an example I created which seems to work fine: https://github.com/sevagh/Music-Separation-TF/blob/master/algorithms/HPSS_CQNSGT_realtime.py

Much like a "realtime STFT" would involve having a realtime input stream of $hop chunks, say 1024 samples, and ringbuffers of 2*$hop, or window/frame size = 2*$hop, to store the last two hops received from the stream, I adapted the same idea with trlen (transition area length) as the hop and sllen (slice length) as the window/frame size.

The latency is kinda high, though, and the smallness of the slice length makes it impossible to realize the constant-Q factor for all desired bins (12 bins on the Octave scale):

/home/sevagh/.local/lib/python3.8/site-packages/nsgt/nsgfwin_sl.py:64: UserWarning: Q-factor too high for frequencies 80.00,84.74,89.75,95.07,100.69,106.66,112.97,119.66,126.74,134.24,142.19,150.61,159.53,168.97,178.97,189.57,200.79,212.68,225.27,238.61,252.73,267.69,283.54,300.33,318.11,336.94,356.89,378.02,400.40,424.10,449.21,475.80,503.97,533.80,565.41,598.88,634.33,671.89,711.66

edit I also step through wav audio in hops to simulate a stream, but the general code should be adaptable for a true realtime input stream.

@sevagh Amazing--thanks for this!

Tangential but this could make it simpler: https://librosa.org/doc/latest/generated/librosa.stream.html