Real-time streamed SliCQ

Question

Real-time streamed SliCQ

Closed this issue 4 years ago · 6 comments

Hi, thanks for this amazing piece of software and the research!

I'm curious to whether it is possible for sliCQ to be used for streamed, real-time samples (e.g. from a microphone). It would be great if it is (and there is an example). I'm curious about the setup that should be used in a streamed scenario

Thanks in advance!

Answer 1 · 2021-02-11T08:38:17.000Z

Hi, it is absolutely possible to use the module for live streaming. Have a look at the `transform_stream.py` example and combine that with some live audio input. Depending on the slice length, there will be latency, and the computational load is also considerable. best, Thomas

…

Am 10.02.2021 um 20:36 schrieb lhl2617 ***@***.***>: Hi, thanks for this amazing piece of software and the research! I'm curious to whether it is possible sliCQ to be used for streamed, real-time samples (e.g. microphone). It would be great if it is (and there is an example). I'm curious as to how to slicing should be done in an online scenario Thanks in advance! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Thomas Grill http://grrrr.org

Answer 2 · 2021-02-11T15:48:34.000Z

Hi Thomas,

Thanks a lot for the reply. I've looked at the example but was misled as it instead just processed a whole file--thanks for clarifying!

I'll try out certain configurations--but I take it there are no experiments done so far with live microphone streamed audio to show some parameters that work well w.r.t. load, latency and other issues? In particular I'm not sure what microphone block length, slice and transition length to use. I'm aiming to minimise latency given octave scale, fmin = 130.8 (C3), fmax = 22050, bins/octave = 12.

Thanks and regards,
Lin

Answer 3 · 2021-02-11T16:05:02.000Z

Hi, latency is directly related to the slice length. The slice length in turn determines the frequency accuracy of your analysis scale. For example, with a sample rate=44100, slice length=4096 and f=130.8Hz, 12 integer sine periods will fit into the slice, resulting in an actual frequency of 129.2 Hz Hence, the slice length depends on your needs for scale accuracy. best, Thomas

…

Am 11.02.2021 um 16:48 schrieb lhl2617 ***@***.***>: Hi Thomas, Thanks a lot for the reply. I've looked at the example but was misled as it instead just processed a whole file--thanks for clarifying! I'll try out certain configurations--but I take it there are no experiments done so far with live microphone streamed audio to show some parameters that work well w.r.t. load, latency and other issues? In particular I'm not sure what microphone block length, slice and transition length to use. I'm aiming to minimise latency given octave scale, fmin = 130.8 (C3), fmax = 22050, bins/octave = 12. Thanks and regards, Lin — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Thomas Grill http://grrrr.org

Answer 4 · 2021-02-11T16:19:27.000Z

Hi!

Thanks for the quick reply!

I'm not sure what the implication of the calculation is--tracing back:

(4096/44100) / (1/130.8) = 12.15 
floor(12.15) = 12
(4096/44100) / (1/f) = 12 => f = 129.2 Hz

What implication does this number have? I understand that CQT is still subject to the time-freq uncertainty principle and if the slice length is too short, the low frequencies would have poor freq. res..

Further, the paper states that the slices overlap--how does this affect real-time slice-wise processing? I assume the transition length is the extent of the overlapping between the windows--if the microphone sends in one slice at a time, should the transition length be 0? or should the microphone send in a block of a few slices and use non-zero transition length (which will cause further issues between blocks I presume).

Thanks again,
Lin

Answer 5 · 2021-03-02T15:46:52.000Z

Here's an example I created which seems to work fine: https://github.com/sevagh/Music-Separation-TF/blob/master/algorithms/HPSS_CQNSGT_realtime.py

Much like a "realtime STFT" would involve having a realtime input stream of $hop chunks, say 1024 samples, and ringbuffers of 2*$hop, or window/frame size = 2*$hop, to store the last two hops received from the stream, I adapted the same idea with trlen (transition area length) as the hop and sllen (slice length) as the window/frame size.

The latency is kinda high, though, and the smallness of the slice length makes it impossible to realize the constant-Q factor for all desired bins (12 bins on the Octave scale):

/home/sevagh/.local/lib/python3.8/site-packages/nsgt/nsgfwin_sl.py:64: UserWarning: Q-factor too high for frequencies 80.00,84.74,89.75,95.07,100.69,106.66,112.97,119.66,126.74,134.24,142.19,150.61,159.53,168.97,178.97,189.57,200.79,212.68,225.27,238.61,252.73,267.69,283.54,300.33,318.11,336.94,356.89,378.02,400.40,424.10,449.21,475.80,503.97,533.80,565.41,598.88,634.33,671.89,711.66

edit I also step through wav audio in hops to simulate a stream, but the general code should be adaptable for a true realtime input stream.

Answer 6 · 2021-03-02T21:38:32.000Z

@sevagh Amazing--thanks for this!

Tangential but this could make it simpler: https://librosa.org/doc/latest/generated/librosa.stream.html