Query regarding visualization of attention

Question

Query regarding visualization of attention

Sowmya-R-Krishnan opened this issue 4 years ago · 0 comments

Thank you @gordicaleksa for the fantastic code and detailed documentation! It has helped me a lot in understanding the details of GAT.
While looking at the visualization functions in the code - I understand that entropy is used because the softmax applied over the attention coefficients bring it into a range of [0, 1] - resembling a probability distribution. While obtaining the attention coefficients from the GAT layer in the code, you have used:

def visualize_entropy_histograms(model_name=r'gat_PPI_000000.pth', dataset_name=DatasetType.PPI.name):
    # Fetch the data we'll need to create visualizations
    all_nodes_unnormalized_scores, edge_index, node_labels, gat = gat_forward_pass(model_name, dataset_name)

all_nodes_unnormalized_scores comes from the GAT forward function:

out_nodes_features = self.skip_concat_bias(attentions_per_edge, in_nodes_features, out_nodes_features)
return (out_nodes_features, edge_index)

When reading the GAT paper (Petar Veliˇckovi ́c et al) - the attention coefficients obtained after softmax are used to obtain the final output node features from the GAT layer. In the GAT implementation:

attentions_per_edge = self.neighborhood_aware_softmax(scores_per_edge, edge_index[self.trg_nodes_dim], num_of_nodes)

the above function gives the attention coefficients in [0, 1] range. The subsequent functions (self.aggregate_neighbors and self.skip_concat_bias) will give the final node features from the GAT layer. So is the "all_nodes_unnormalized_scores" variable used in the entropy histogram visualization function still in the range [0, 1]? Or is the entropy histogram used to visualize the output node features and not the softmax-normalized attention coefficients?

I also came across the entropy visualization in a DGL tutorial on GAT (https://docs.dgl.ai/en/0.4.x/tutorials/models/1_gnn/9_gat.html) and they were using the attention coefficients after softmax normalization for the visualization. Sorry if the question is very naive - I'm trying to apply this visualization to one of my projects involving inductive learning. Let me know if I have misunderstood the information being extracted from the GAT layer. Thanks in advance!