softmax in GraphormerAttentionHead

Question

softmax in GraphormerAttentionHead

JingweiQu opened this issue a year ago · 3 comments

softmax = torch.softmax(a, dim=-1)

The softmax function is not exact here since we should compute the attention in each graph. However, such direct computation causes the attention between nodes from different graphs in a batch.

Maybe combining batch_mask with -10^6 replacing 0 is a solution.

Answer 1 · 2023-08-25T21:39:05.000Z

Thanks for the issue!
Yeah, that sounds like a good solution!

Answer 2 · 2023-08-27T23:36:32.000Z

fixed in f56d648

You're welcome to test!!!