grrrr/nsgt

What are the dimensions of the NSGT_sliced output

sevagh opened this issue · 2 comments

I get the following shape from the NSGT:

# 125 frequency bins
scl = MelScale(78, 22050, 125)

nsgt = NSGT_sliced(scl, 9216, 2304, 44100, real=True, matrixform=True, multichannel=True)

forward = np.asarray(list(nsgt.forward((audio.T,)))).astype(np.complex64)

# shape of forward

# T is number of frames in time - by what division? is it "sllen + (0 <= n <= trlen)"?
# I believe the hop size and/or trlen/transition area is expected to be variable to maintain perfect invertibility.

# 2 channels because audio is stereo

# second-last dimension is specified frequency bins+1, so 126

# the last shape is nsgt.coef_factor*sllen = 304 - what is this dimension?

T x (2 channels) x 126 x 304

In the analog to an STFT (which typically has a shape like I x F x T for chan x frequency_bins x time_frames), what are the two frequency dimensions of the NSGT_sliced?

Are the 126 frequency bins interpolated or duplicated to create a bigger vector of 304 values?

Is there a recommended way to compress the 126 + 304 dimensions into a single one? Simple concatenation? Such that I can pass around "T x F" rectangular time-frequency matrices?

https://github.com/grrrr/nsgt/blob/master/examples/spectrogram.py#L19

Is this the use of the assemble_coeffs function? Is there a corresponding possibility to disassemble?