F.multi_head_attention_forward missing parameter 'average_attn_weights=True'?

Question

F.multi_head_attention_forward missing parameter 'average_attn_weights=True'?

Opened this issue a year ago · 1 comments

TypeError: multi_head_attention_forward got an unexpected keyword argument 'average_attn_weights'

Answer 1 · 2023-09-18T09:01:52.000Z

If your torch version is less than 2.0, you can simply remove the 'average_attn_weights=True'option, as the averaging function is already implemented in the 'multi_head_attention_forward'.

def multi_head_attention_forward(
     .....
    if need_weights:
        # average attention weights over heads
        attn_output_weights = attn_output_weights.view(bsz, num_heads, tgt_len, src_len)
        return attn_output, attn_output_weights.sum(dim=1) / num_heads
    else:
        return attn_output, None