PetarV-/DGI

Questions about data sets?

linzhi123 opened this issue · 8 comments

Hello, how do you make the data set under the data folder?

Hello,

I'm not sure I understand your question. We use standard benchmark datasets for all experiments reported in our paper. For example, Cora, Citeseer and Pubmed can all be found in Thomas Kipf's GCN repository:
https://github.com/tkipf/gcn/tree/master/gcn/data

Thanks,
Petar

Thanks

I mean how are the files in these datasets made?
@PetarV-

Perhaps this description can help?
https://github.com/kimiyoung/planetoid/blob/master/README.md

I'm sorry I cannot be of much more help than that---I didn't take part in preparing the files.

Ok, thank you for your answer.

Hi Petar,
Can you provide Reddit and PPI datasets in the format used in the code?

Thanks

Hello,

For PPI, the preprocessing code found in:

https://github.com/PetarV-/GAT/blob/master/utils/process_ppi.py

should be enough to get you started.

For Reddit, we were unable to get the PyTorch version of GraphSAGE to cooperate, and thus we used the TensorFlow version:

https://github.com/williamleif/GraphSAGE

as a starting point, and modified it to support DGI and load Reddit. Currently there are no plans to release this modified codebase.

Thanks,
Petar

Thanks a lot!