About the bias of 3d convolutions in the attention block

Question

About the bias of 3d convolutions in the attention block

liucu0135 opened this issue 5 years ago · 1 comments

Hello, I noticed you did not set bias=False in the 1x1 3d convolution layers which implements

phi=W_phix+B_phi
g=W_gx+B_g
theta=W_theta*x+B_theta

I have read some materials and papers. None of them have mentioned if there are bias terms like B_phi, B_g and B_theta.
I have tried my implementation with Bias=True, just like you did, and it did improve the performance.

I just want to ask where did you come to this idea of setting bias=True(as default). Just in case if I was missing something in my reading.

Answer 1 · 2020-08-20T10:14:43.000Z

Hi @liucu0135 . In fact, I omitted this point.
I guess that bias parameter can improve fitting ability of models. And in some algorithms, in order to avoid overfitting, the Conv layers often don't contain bias parameters.