Problems about the architecture of Attention.

Question

QJ-Chen opened this issue a year ago · 0 comments

In model.attention
AttentionModule1.shortcut_short is not used. You calculate the shortcut with the downsample weights.

shortcut_short = self.soft_resdown3(x_s)

AttentionModule3.shortcut_short is unnecessary.