Sparse Structure Learning via Graph Neural Networks for inductive document classification [arXiv]
We use the same benchmark datasets that are used in Yao, Mao, and Luo 2019, where we follow the same train/test splits and data preprocessing for MR, Ohsumed and 20NG datasets as Kim 2014; Yao, Mao, and Luo 2019. Thanks for their work.
For R8 and R52 datasets, they are only provided by a preprocessed version that lack punctuations and do not have explicit sample names. Since we use documents with sentence segmentation information to construct graph, we re-extract the data from original Reuters-21578 dataset.
You can download the dataset here:
- re-extract R8 and R52 datasets.
python re-extract_data/mk_R8_R52.py --name R8
- remove words.
python remove_words.py --name R8
To run the code, you should change Your_path=/data/project/yinhuapark/ssl/
to your own path.
- create co-occurrence pairs of each documents.
python ssl_make_graphs/create_cooc_document.py --name R8
- construct graphs of each documents in InMemoryDatset.
python ssl_make_graphs/PygDocsGraphDataset.py --name R8
python ssl_graphmodels/pyg_models/train_docs.py --name R8