How did you preprocess datasets?
hazdzz opened this issue · 1 comments
hazdzz commented
Here is the original PubMed dataset: https://linqs-data.soe.ucsc.edu/public/Pubmed-Diabetes.tgz
How did you preprocess this dataset into npz form in this repository? Does the feature file of each dataset in this repository normalized yet?
abojchevski commented
Unlike the Cora/CoraML datasets, we did not process PubMed ourselves, but rather we already used the preprocessed version as in previous papers. The features for each node are just the TF/IDF word features.