benedekrozemberczki/SimGNN

About dataset

BenedictWongBJUT opened this issue · 6 comments

I am really interested in this amazing work, but I don't understand how datasets are generated (or processed), or are training data and test data generated from public data sets (just like Linux, AIDS, mentioned in the paper)? I desire to know how syngen.py works and the output of this function.

Thanks a lot.

I am really interested in this amazing work, but I don't understand how datasets are generated (or processed), or are training data and test data generated from public data sets (just like Linux, AIDS, mentioned in the paper)? I desire to know how syngen.py works and the output of this function.

Thanks a lot.

I am also curious about this question.

I am also curious about this question.

I have already figured it out. You can download datasets in this link (https://drive.google.com/drive/folders/1lY3pqpnUAK0H9Tgjyh7tlMVYy0gYPthC?usp=sharing) and with the help of networkX (nx.read_gexf) you can transform the original data (like: 1.gexf) into graph object. And then use syngen.py to get pairwise similarity and generate dataset at the same time. Finally, you can split them into train, validation and test set.

十分感谢!

I had the same problem. If it's okay with you, can I see the code that uses syngen.py to modify AIDS, etc., other data?

Ask if syngen.py is coded by yourself. If so, can you share it