acbull/pyHGT

Med and OAG datasets

Weesley743 opened this issue · 4 comments

Hi, there seems to be only CS dataset in google drive. How can I get the Med and OAG datasets?
I'm confused about the graph_NN and graph_ML datasets. What are their connections with the two datasets(Med,OAG) mentioned in the paper?

I use graph.node_feature['paper'] to print node features on graph_CS dataset, and the result show that there are 544243 records, but table1 in paper shown that there are 5597605 paper nodes. How to extract the features of all paper nodes? Can you give me some advice? Thank you so much.

Hi, as the google drive can only provide 15 GB of storage, I currently only provide the CS dataset. ML and NN datasets are sub-graphs of the CS dataset, whose papers are fields within ML and NN. I'll consider adding Med and whole OAG later (after expanding the google drive storage).

For the dataset scale, table 1 shown in the paper is the original scale. As shown in the pre_process.py code (line 47), I add a filter step to only consider paper whose citation above a threshold to increase the network density. You can delete that line to extract features for all paper nodes.

Thanks for your detailed reply.

Hi, as the google drive can only provide 15 GB of storage, I currently only provide the CS dataset. ML and NN datasets are sub-graphs of the CS dataset, whose papers are fields within ML and NN. I'll consider adding Med and whole OAG later (after expanding the google drive storage).

Hi! Any updates on uploading the whole OAG dataset?