About input data

Question

About input data

Opened this issue a year ago · 9 comments

Hi:
May I ask if the input data of the model is composed of the values of the real and imaginary parts extracted after STFT, or is the amplitude taken as input after STFT?
Thanks!

Answer 1 · 2023-08-14T08:35:47.000Z

Hi,
the previous papers used only the amplitude spectrogram.
recently, we found that complex spectrogram (real + imaginary via STFT) yield slightly better performance.

Answer 2 · 2023-08-14T09:14:05.000Z

Thank you for your reply. Also, regarding the loss function SDR, is it like this：

sdr = (tf.math.square(tf.norm(sy_true - y_estimate)) + 0.001*tf.math.square(tf.norm(y_true))) / (tf.math.square(tf.norm(y_true)) + 1e-8)

num = tf.math.log(sdr + 1e-8)
nom = tf.math.log(tf.constant(10 , dtype = num.dtype))

sdr_loss = 10 * (num / nom)

Answer 3 · 2023-08-14T09:47:42.000Z

norm = torch.sum(s1*s2, -1, keepdim=True)

torch.mean(10*torch.log10( sdr calculation in Eq. ))

Answer 4 · 2023-08-14T10:20:35.000Z

Thank you for your reply. i know the Eq. is:

sdr = 10log10( || y_true - y_estimate||^2 + β*||y_true||^2) / ||y_true||^2

I don't quite understand the role of "norm = torch.sum(s1*s2, -1, keepdim=True)" or my formula incorrect.

Answer 5 · 2023-08-14T12:04:38.000Z

my implementation is with pytorch, probably they behave the same of calculation

Answer 6 · 2023-08-15T02:20:30.000Z

Thank you for your reply. May I ask if the formula and code for my sdr are correct. Thanks!

Answer 7 · 2023-08-15T02:28:58.000Z

norm_true = torch.sum(y_truey_true, -1, keepdim=True)
norm_diff = torch.sum((y_esti-y_true)(y_esti-y_true), -1, keepdim=True)
sdr_loss = torch.mean(10*torch.log10( ... )

if the tensorflow calculation follows the formula, it will be no problem.

Answer 8 · 2023-08-17T02:52:11.000Z

Thank you for your reply.According to the paper,whether the model structure is like this :
conv_1 = Conv2D(16,(2,5),(1,1),padding='same')(..)
bn_1 = BatchNormalization(conv_1)

bi_ls_1 = Bidirection(LSTM(units=64,...))(bn_1)
bi_ls_2 = Bidirection(LSTM(units=64,...))(bi_ls_1 )

full_1 = Dense(32,)(bi_ls_2 )
ac_1 = ReLU(full_1 )

ls_1 = LSTM(128, ...)(ac_1)
ls_2 = LSTM(128, ...)(ls_1)

full_2 = Dense(2,)(ls_2)

Thanks!

Answer 9 · 2023-08-17T08:43:45.000Z

just be careful of the 'input' for Bidirection and LSTM:
Bidirection is for the frequency axis
LSTM: time axis but parameters-share for each frequency-axis input (Subband Network)