huazai1992/HGCN

Some paper nodes' representations in CORA are all zero vectors

Closed this issue · 4 comments

Thanks for sharing the code!
When I observe the node representations in '/data/net_cora/P.feat.npz', I find 1347 paper nodes' representations are all zero vectors (there are a total of 19396 paper nodes in CORA).

We extract a bag-of-words representation of all the paper titles as content feature for each paper node, which includes 300 words (not all). So, there exist some paper nodes that do not contain these words.

Thanks for your reply.
Would this affect the performance? Since 6.94% of nodes do not have features. And how do you split the training and test set? How many nodes in the test set?

I think it would not affect the performance. In our paper, we mainly use one-hot codes of the labels of target-type nodes as the features (i.e., *.feat.label). For all methods, we randomly sample 80% of labeled nodes as the training set and 20% as the test set

OK, thanks for your reply!