关于attention score计算的一点疑问

作者你好：
关于Text_Texture_Erase_And_Enhance_Module里面attention score的计算有点疑问

Line 138 in c8357ea

    
           x_out = (torch.bmm(x_out, b_att_1.permute(0, 2, 1))).view(m_batchsize, c, width, height)

self.softmax = nn.Softmax(dim=1)
f1 = x_1.view(m_batchsize, -1, width * height)
b_att = torch.bmm(f1.permute(0, 2, 1), f1)
f1 = x_mask.view(m_batchsize, -1, width * height)
mask_att = torch.bmm(f1.permute(0, 2, 1), f1)
b_att = b_att * mask_att
b_att = b_att.view(m_batchsize, -1, width, height)
b_att = self.softmax(b_att)`

b_att 的计算是（batchsize，查询query，维度dim，）@ （batchsize，维度dim，查询key，）=（batchsize，查询query，查询key）
这里的softmax源码是(dim=1)而不是（dim-1），请问为什么不是在查询key做softmax？