Some paper nodes' representations in CORA are all zero vectors

Question

Some paper nodes' representations in CORA are all zero vectors

Closed this issue 4 years ago · 4 comments

Thanks for sharing the code!
When I observe the node representations in '/data/net_cora/P.feat.npz', I find 1347 paper nodes' representations are all zero vectors (there are a total of 19396 paper nodes in CORA).

Answer 1 · 2020-11-18T02:23:21.000Z

We extract a bag-of-words representation of all the paper titles as content feature for each paper node, which includes 300 words (not all). So, there exist some paper nodes that do not contain these words.

Answer 2 · 2020-11-18T02:42:16.000Z

Thanks for your reply.
Would this affect the performance? Since 6.94% of nodes do not have features. And how do you split the training and test set? How many nodes in the test set?

Answer 3 · 2020-11-18T02:53:00.000Z

I think it would not affect the performance. In our paper, we mainly use one-hot codes of the labels of target-type nodes as the features (i.e., *.feat.label). For all methods, we randomly sample 80% of labeled nodes as the training set and 20% as the test set

Answer 4 · 2020-11-18T02:57:13.000Z

OK, thanks for your reply!