Vector of learned coefficients for the attention mechanism and question about edge_index
DevBySam7 opened this issue · 4 comments
Hello, first of all thank you very much for making this amazing work public! The idea and the results are really impressive. So here are my questions:
-
Is the formula (7) of the paper right? As far as i understand: alpha = (key_i * cat_att_i).sum(-1) + (key_j * cat_att_j).sum(-1) in line 103 of the graph_layer doesn't reflect formula (7) (without LeakyReLU).
-
Is there an Idea of creating the edge_index with TopK inside the GDN module and not outside with a seperate function?
Thanks for your interest, too.
Hello, I appreciate your fast answers. Sorry if i was unclear i'll try to reformulate my questions.
-
If i compare the paper with the code i understand:
(a^T(g_i^(t)⊕g_j^(t)) = alpha = (key_i * cat_att_i).sum(-1) + (key_j * cat_att_j).sum(-1).
I was just questioning if this mathematical formulation (formula 7) is correct, since i not really see it, or if the mathematical formula of the paper would be slightly incorrect? -
This question was more about the idea, why the topK was created inside the GDN. I'm not sure but i think that it would make a little bit more sense to return the embeddings from the GDN model at each epoch and then calculate a new edge index out of them and then feed them again into the GDN. But don't worry, it is really hard to formulate and I was just curious :). The first question is far more important to me.
Greetings :)
- The implementation here is just decomposing the linear combination into two parts, instead of concatenating first and linear combination in formula (7), we split them into g_i(key_i) and g_j(key_j), and do linear combination for each part and combine them together. But the implementation here is just the same meaning as formula (7), still doing linear combination.
- If I understand correctly, I think you indicate that the edge index should be built on the output embedding of GDN and then use the new edge_index for final embedding computation. It sounds like this is two-layer GDN, but the first layer is using the current edge_index and the second-layer use the new edge_index based on the result of first-layer's output. I think it is plausible intuitively :).
Alright, this answers my questions. Thanks alot for taking the time!