We provide (1) datasets and source code for (2) benchmark task, (3) downstream task and (4) ablation study of WHATsNET in the paper: Classification of Edge-Dependent Labels of Nodes in Hypergraphs, Minyoung Choe, Sunwoo Kim, Jaemin Yoo, and Kijung Shin, KDD 2023.
(1) Datasets
We provide six real-world datasets for our new benchmark task(/dataset/
) and preprocessing code (/dataset/PreprocessCode/
)
- Co-authorship : DBLP and AMinerAuthor
- Email : Enron and Eu
- StackOverflow: Biology and Physics
# File Organization
|__ hypergraph.txt # used for constructing hypergraph; i-th line indicates i-th hyperedge includes v_1, v_2, ...
|__ hypergraph_pos.txt # used for edge-dependent node labels; i-th line indicates depending on i-th hyperedge v_1's label, v_2's label, ... (same order as hypergraph.txt)
|__ [valid/test]_hindex_0.txt # used for splitting train/valid/test
Due to the large size of the dataset, only 'StackOverflowBiology' and 'DBLP' is visible now in the anonymous GitHub
(2) Benchmark Task
We provide source code for running WHATsNET as well as nine competitors in all the above benchmark datasets
- BaselineU and BaselineP
- HNHN, HGNN, HCHA, HAT, UniGCNII, HNN
- HST, AST
- WHATsNET
(3) Downstream Task
We apply our benchmark task on the following downstream tasks,
- Ranking Aggregation: https://github.com/uthsavc/hypergraph-halo-ranking
- Clustering: https://github.com/pnnl/HyperNetX/blob/master/tutorials/Tutorial%2011%20-%20Laplacians%20and%20Clustering.ipynb
- Product Return Prediction: https://github.com/jianboli/HyperGo
(4) Reproducing ALL results in Paper
- Ablation Studies of WHATsNET
- w/o WithinATT and WithinOrderPE
- WHATsNET-IM
- Positional encodings schemes
- Replacing WithinATT in updating node embeddings
- Number of inducing points
- Types of node centralities
- Visualization of WHATsNET
- Evaluation on Node Label Distribution Preservation of WHATsNET
Before training WHATsNET, calculating node centralities is required
cd preprocess
python nodecentrality.py --algo [degree,kcore,pagerank,eigenvec] --dataname [name of dataset]
You can
- train WHATsNET
- evaluate WHATsNET on JSD of node-level label dist.
- predict edge-dependent node labels by trained WHATsNET
- analysis node embeddings for visualization: concatenated embeddings of a node and hyperedge pair, node embeddings before/after WithinATT
by following below code,
python train.py/evaluate.py/predict/analysis.py --vorder_input "degree_nodecentrality,eigenvec_nodecentrality,pagerank_nodecentrality,kcore_nodecentrality"
--embedder whatsnet --att_type_v OrderPE --agg_type_v PrevQ --att_type_e OrderPE --agg_type_e PrevQ
--dataset_name [name for dataset]
--num_att_layer [number of layers in WithinATT]
--num_layers [number of layers]
--bs [batch size]
--lr [learning rate]
--sampling [size of sampling incident hyperedges in aggregation at nodes]
[--analyze_att when running analysis.py]
--scorer sm --scorer_num_layers 1 --dropout 0.7 --optimizer "adam" --k 0 --gamma 0.99 --dim_hidden 64 --dim_edge 128 --dim_vertex 128 --epochs 100 --test_epoch 5
You can run all ten models for each dataset(DBLP,AMinerAuthor,emailEnron,emailEu,StackOverflowBiology,StackOverflowPhyscis) by
cd run
./run_[DBLP,AMinerAuthor,emailEnron,emailEu,StackOverflowBiology,StackOverflowPhyscis].sh
We set hyperparameters of each model chosen by the best mean of Micro-F1 and Macro-F1 from the search space
We provide edge-dependent node labels predicted by WHATsNET as well as AST and HST in train_results/
We also provide shell scripts for all-in-one process (train, predict and evaluate on the downstream task) in run/DownstreamTask/
You can run three downstream tasks with WHATsNET and baselines by
- Ranking Aggregation: In the
RankingAggregation
directory, runranking.py
for Halo2 game dataset and runaminer_ranking.py
for AMiner dataset with author H-index - Clustering: In the
Clustering
directory, runclustering.py
for DBLP and runclustering_aminer.py
for AMiner - Product Return Prediction: In the
ProductReturnPred
directory, make synthetic dataset bymakedata/Simulate data.ipynb
and prepare dataset for training models by our benchmark task throughmakedata/MakeHypergraph.ipynb
. After training models, runmakedata/prepare_predicted.py
and evaluate them byscript/main_prod.py
You can also run all ablation studies of WHATsNET by
cd run
./run_ablation.sh
./run_ablation_centrality.sh
The environment of running codes is specified in requirements.txt
Additionally, install required libraries following install.sh