Architecture of the aggregation in HeteroSAGEConv?
Opened this issue · 1 comments
Hello!
Not really an issue but I have a question about the implementation of the update step in hetero_gnn.py. What is the benefit of calculating the output via these lines:
aggr_out = self.lin_neigh(aggr_out)
node_feature_self = self.lin_self(node_feature_self)
aggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)
so applying a linear layer to the aggregated neighbour features and another linear layer to features of the node itself, and afterwards applying another layer to the concatenation of the results? In terms of the weights matrix multiplications this represents:
I thought it would be simpler to use justaggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)
where self.lin_update
is now initialised as self.lin_update = nn.Linear(self.in_channels_self + self.in_channels_neigh, self.out_channels)
and we don't need the linear layers self.lin_neigh
and self.lin_self
anymore?
This represents something like
where CONCAT is the vector concatenation operator and the prime indicates that we now have a different dimension for W_y and b_y.
In terms of the number of parameters in the model it doesn't make a huge difference but by including these additional layers, you have a more complex optimisation surface that involves a product of weights matrices. Would this not make it a bit harder for the gradient descent algorithm to get to a good solution?
Thank you for any explanation you can provide for the benefits of the slightly more complex architecture implemented in deepsnap!
Hi,
The idea of using separate linear layers for self and neighbor is mainly derived from the Relational GCN, which is briefly described in P13 of this slides. And adding another layer at the end may play the role of post-process layer, which is introduced in P52 of this slides. Usually using a post-process layer can be helpful, as shown in this paper.