ffxiong/stsubnet

About input data

Opened this issue · 9 comments

panhu commented

Hi:
May I ask if the input data of the model is composed of the values of the real and imaginary parts extracted after STFT, or is the amplitude taken as input after STFT?
Thanks!

Hi,
the previous papers used only the amplitude spectrogram.
recently, we found that complex spectrogram (real + imaginary via STFT) yield slightly better performance.

panhu commented

Thank you for your reply. Also, regarding the loss function SDR, is it like this:

sdr = (tf.math.square(tf.norm(sy_true - y_estimate)) + 0.001*tf.math.square(tf.norm(y_true))) / (tf.math.square(tf.norm(y_true)) + 1e-8)

num = tf.math.log(sdr + 1e-8)
nom = tf.math.log(tf.constant(10 , dtype = num.dtype))

sdr_loss = 10 * (num / nom)

norm = torch.sum(s1*s2, -1, keepdim=True)

torch.mean(10*torch.log10( sdr calculation in Eq. ))

panhu commented

Thank you for your reply. i know the Eq. is:

sdr = 10log10( || y_true - y_estimate||^2 + β*||y_true||^2) / ||y_true||^2

I don't quite understand the role of "norm = torch.sum(s1*s2, -1, keepdim=True)" or my formula incorrect.

my implementation is with pytorch, probably they behave the same of calculation

panhu commented

Thank you for your reply. May I ask if the formula and code for my sdr are correct. Thanks!

norm_true = torch.sum(y_truey_true, -1, keepdim=True)
norm_diff = torch.sum((y_esti-y_true)
(y_esti-y_true), -1, keepdim=True)
sdr_loss = torch.mean(10*torch.log10( ... )

if the tensorflow calculation follows the formula, it will be no problem.

panhu commented

Thank you for your reply.According to the paper,whether the model structure is like this :
conv_1 = Conv2D(16,(2,5),(1,1),padding='same')(..)
bn_1 = BatchNormalization(conv_1)

bi_ls_1 = Bidirection(LSTM(units=64,...))(bn_1)
bi_ls_2 = Bidirection(LSTM(units=64,...))(bi_ls_1 )

full_1 = Dense(32,)(bi_ls_2 )
ac_1 = ReLU(full_1 )

ls_1 = LSTM(128, ...)(ac_1)
ls_2 = LSTM(128, ...)(ls_1)

full_2 = Dense(2,)(ls_2)

Thanks!

just be careful of the 'input' for Bidirection and LSTM:
Bidirection is for the frequency axis
LSTM: time axis but parameters-share for each frequency-axis input (Subband Network)