about mask in relative bias table
yeppp27 opened this issue · 0 comments
yeppp27 commented
Hello! Thanks for your interesting work! I have some doubt about the mask in relative bias table: if we already have attn+mask, why still need to multiply mask and res_pos: relative_position_bias = relative_position_bias * rel_pos_mask.view(-1, N, N, 1) in window group attention?