About dataset

Question

About dataset

BenedictWongBJUT opened this issue 3 years ago · 6 comments

I am really interested in this amazing work, but I don't understand how datasets are generated (or processed), or are training data and test data generated from public data sets (just like Linux, AIDS, mentioned in the paper)? I desire to know how syngen.py works and the output of this function.

Thanks a lot.

Answer 1 · 2021-06-23T01:21:30.000Z

I am really interested in this amazing work, but I don't understand how datasets are generated (or processed), or are training data and test data generated from public data sets (just like Linux, AIDS, mentioned in the paper)? I desire to know how syngen.py works and the output of this function.

Thanks a lot.

Answer 2 · 2021-07-22T03:00:01.000Z

I am also curious about this question.

Answer 3 · 2021-08-07T09:12:07.000Z

I am also curious about this question.

I have already figured it out. You can download datasets in this link (https://drive.google.com/drive/folders/1lY3pqpnUAK0H9Tgjyh7tlMVYy0gYPthC?usp=sharing) and with the help of networkX (nx.read_gexf) you can transform the original data (like: 1.gexf) into graph object. And then use syngen.py to get pairwise similarity and generate dataset at the same time. Finally, you can split them into train, validation and test set.

Answer 4 · 2021-08-08T04:15:09.000Z

十分感谢！

Answer 5 · 2021-10-14T08:25:19.000Z

I had the same problem. If it's okay with you, can I see the code that uses syngen.py to modify AIDS, etc., other data?

Answer 6 · 2021-11-08T08:40:15.000Z

Ask if syngen.py is coded by yourself. If so, can you share it