Andong-Li-speech/GaGNet

Discuss about the causality of nn.InstanceNorm1d and nn.InstanceNorm2d ?

Closed this issue · 3 comments

Hi,

I find the GaGNet use InstanceNorm in these places:

nn.InstanceNorm2d(c, affine=True),

nn.InstanceNorm1d(cd1, affine=True),

For the InstanceNorm2d, the input shape is [batch, channel, num_frames, freq_feature_size], the mean and variance are calculated per [num_frames, freq_feature_size], which contain the all frames. So, the InstanceNorm2d seems to be non-causal.

Same for InstanceNorm1d.

Hi,

I find the GaGNet use InstanceNorm in these places:

nn.InstanceNorm2d(c, affine=True),

nn.InstanceNorm1d(cd1, affine=True),

For the InstanceNorm2d, the input shape is [batch, channel, num_frames, freq_feature_size], the mean and variance are calculated per [num_frames, freq_feature_size], which contain the all frames. So, the InstanceNorm2d seems to be non-causal.

Thanks for your comment. Indeed in the training phase, the statistics about mean and variance for IN or BN are calculated among the whole spectrogram. However, they will be fixed in the inference phase. Although it is not straightly causal in the training phase, I think it is okay in the causal inference. Actually, you may as well replace 1-D(2-D) IN with LN or cumulative-style norm to see the performance and I am also looking forward to seeing the results : )

Thanks for the reply. By default, there is a little difference between BN and IN. BN uses the smoothed statistics during inference, but IN does not. It's like you say, that BN, LN, and cumulative normalization can be good choices.