grrrr/nsgt

Variable-Q transform

Closed this issue · 5 comments

I notice that the Mel scale construction leads to a variable Q per octave. Can we do similar to create a VQ-NSGT? Variable-Q per octave?
It's mentioned in a few places: https://www.isca-speech.org/archive/interspeech_2015/papers/i15_2744.pdf

grrrr commented

Cool - I learned of an interesting technique for a simple variable-Q transform. I created a class called VQLogScale:

+class VQLogScale(Scale):
+    def __init__(self, fmin, fmax, bnds, gamma=0, beyond=0):
+        """
+        @param fmin: minimum frequency (Hz)
+        @param fmax: maximum frequency (Hz)
+        @param bnds: number of frequency bands (int)
+        @param gamma: decrease q at low frequencies with an offset
+        @param beyond: number of frequency bands below fmin and above fmax (int)
+        """
+        Scale.__init__(self, bnds+beyond*2)
+        lfmin = np.log2(fmin)
+        lfmax = np.log2(fmax)
+        odiv = (lfmax-lfmin)/(bnds-1)
+        lfmin_ = lfmin-odiv*beyond
+        lfmax_ = lfmax+odiv*beyond
+        self.fmin = 2**lfmin_
+        self.fmax = 2**lfmax_
+        self.pow2n = 2**odiv
+        self.gamma = gamma
+
+    def F(self, bnd=None):
+        return self.fmin*self.pow2n**(bnd if bnd is not None else np.arange(self.bnds)) + self.gamma

Gamma is an offset frequency that is added to each band - a small value is picked, e.g. 25Hz (I saw this in 2 papers: 1, 2), which widens the bands in the smaller frequency range. For example, here's a VQLogScale compared to LogScale for a 12-bin CQ-NSGT:

>>> from nsgt import LogScale, VQLogScale
>>>
>>> scl1 = LogScale(20, 22050, 12)
>>> scl2 = VQLogScale(20, 22050, 12, gamma=25) # 25 hz offset
>>>
>>>
>>> import librosa
>>> bands1, qs1 = scl1()
>>> bands2, qs2 = scl2()
>>> _ = [print('{0:.2f} {1}'.format(b, librosa.hz_to_note(b))) for b in bands1]
20.00 D♯0
37.81 D♯1
71.48 D2
135.14 C♯3
255.48 C4
482.98 B4
913.08 A♯5
1726.19 A6
3263.39 G♯7
6169.48 G8
11663.50 F♯9
22050.00 F10
>>>
>>> _ = [print('{0:.2f} {1}'.format(b, librosa.hz_to_note(b))) for b in bands2]
45.00 F♯1
62.81 B1
96.48 G2
160.14 E3
280.48 C♯4
507.98 B4
938.08 A♯5
1751.19 A6
3288.39 G♯7
6194.48 G8
11688.50 F♯9
22075.00 F10
>>>
>>> _ = [print('{0:.2f}'.format(q)) for q in qs1]
0.77
0.77
0.77
0.77
0.77
0.77
0.77
0.77
0.77
0.77
0.77
0.77
>>> _ = [print('{0:.2f}'.format(q)) for q in qs2]
1.77
1.30
1.06
0.93
0.86
0.83
0.81
0.80
0.79
0.79
0.79
0.79

This actually gives some nice results in my music source separation experiments.

There is a more rigorous VQ-NSGT description here:

For each octave of the input signal’s relevant frequency range, it allows the choice of a desired number of frequency bins and corresponding filters, hence an adapted frequency resolution.

I need to think more deeply about how I would implement the above. E.g.

class VQOctScale:
    def __init__(self, fmin, fmax, bpos, beyond=0):
        """
        @param fmin: minimum frequency (Hz)
        @param fmax: maximum frequency (Hz)
        @param bpos: list of varying bpo per octave, where len(bpos) = number of octaves total
        @param beyond: number of frequency bands below fmin and above fmax (int)

This paper: https://www.cs.tut.fi/sgn/arg/CQT/schoerkhuber-aes-2014.pdf contains a similar Variable-Q definition of simply adding an offset frequency gamma to widen the lower bands. This should be good enough.

Hi @sevagh ,
Will your share the name of above two articles, i couldn't open the link, thanks in advance.

r: https://www.isca-speech.org/archive/interspeech_2015/papers/i15_2744.pdf

r: https://www.cs.tut.fi/sgn/arg/CQT/schoerkhuber-aes-2014.pdf