How did you preprocess datasets?

Question

How did you preprocess datasets?

hazdzz opened this issue 3 years ago · 1 comments

Here is the original PubMed dataset: https://linqs-data.soe.ucsc.edu/public/Pubmed-Diabetes.tgz
How did you preprocess this dataset into npz form in this repository? Does the feature file of each dataset in this repository normalized yet?

Answer 1 · 2022-01-02T12:04:31.000Z

Unlike the Cora/CoraML datasets, we did not process PubMed ourselves, but rather we already used the preprocessed version as in previous papers. The features for each node are just the TF/IDF word features.