LayneH/GreenMIM

about mask in relative bias table

yeppp27 opened this issue · 0 comments

Hello! Thanks for your interesting work! I have some doubt about the mask in relative bias table: if we already have attn+mask, why still need to multiply mask and res_pos: relative_position_bias = relative_position_bias * rel_pos_mask.view(-1, N, N, 1) in window group attention?