Efficiency Analysis of SGC
liu-jc opened this issue · 3 comments
I found the training time improvements of SGC vary on different datasets. For example, SGC was trained 28 times faster than GCN on Pubmed dataset, while it is only < 5 times faster than GCN on TWITTER-WORLD, Cora and Citeseer. I wonder what the reason is to come out these kinds of results. Is there a theoretical guarantee? Or only the empirical results?
The other question is how we can quantitatively analyze the training time of these GCNs? Are there any approaches to do this? I think it is not enough if only analyze the time complexity of matrix multiplication during forward/backward propagation with ignoring the time consuming of non-linear transformation.
I would appreciate your help if you could provide the answers/insights of the above questions.
Hi there,
Thank you for your interest in SGC.
The efficiency of SGC comes from several different aspects:
- SGC precomputes feature-propagation
A^KX
. - SGC collapes K linear transformations into one. Namely, there is only one weight matrix and only 1 matrix multiplication is required for the forward pass.
- Once the feature-propagation is pre-computed, only training data needs to be computed during the forward and backward propagation. In contrast, GCN computes both the training and testing data because of its transductive nature.
When analyzing the speedup of SGC, you have to take all of these factors into consideration. Therefore, the actual dataset characteristics are very important and can lead to different amount of acceleration.
Hope this answer helps.
An additional reason is that TWITTER-WORLD and TWITTER-NA don't fit into our GPUs and the geographconv code base uses CPUs instead.
The speedups on GPUs and CPUs can be different.
Thanks for your answers. I wonder whether you can provide more details of SGC and GCN models you used when reporting their efficiency in your paper. For example, the number of layers and the numbers of hidden units in GCN as well as the K
you used in SGC.