ffxiong/stsubnet

Questions about the details of the model

Closed this issue · 6 comments

Dear authors,

I'm confused by the details of the model. Specifically, from my understanding, before the 2D conv in STRF Extractor, we need get the subband through function F.unfold, so we can calculate input tensor of the 2D conv, tensor shape is [B, F, T, 1, 31, 15]. And I guess after batchnorm, a full connect layer is needs, projecting tensor [B, F, T, 31 * 15] to [B, F, T, D1].

I don't know if I'm understanding this correctly, sincerely looking forward to your reply.

Best.

Hi,

2D conv (self.conv = nn.Conv2d(1, D1, (31, 5))) will take the padded SxT (nn.functional.pad) as input.

Got it, thanks for your reply,

Hello author,
Can you help me with my doubts? Is the input size of the SubNet module [B*F, D_3, T] (D_3 is hidden_size)?

Hello author, Can you help me with my doubts? Is the input size of the SubNet module [B*F, D_3, T] (D_3 is hidden_size)?

I think so.

Hello author, Can you help me with my doubts? Is the input size of the SubNet module [B*F, D_3, T] (D_3 is hidden_size)?

I think so.

Ok thanks, this basically solved my confusion

thanks, hopkin-ghp
yes, the input to the SubNet is [B x F, D_3, T], e.g., [32 x 481, 128, 401]