/TransZero

Official github repository for the paper "Efficient Unsupervised Community Search with Pre-trained Graph Transformer"

Primary LanguagePython

code for the paper "Efficient Unsupervised Community Search with Pre-trained Graph Transformer" which is accepted by VLDB 2024.

Awesome License: MIT Made With Love

Fast Start

0: unzip dataset.zip
1: python link_pretrain.py --dataset cora --batch_size 2708 --dropout 0.1 --hidden_dim 512 --hops 5  --n_heads 8 --n_layers 1 --pe_dim 3 --peak_lr 0.01  --weight_decay=1e-05 --epochs 100
2: python accuracy_globalsearch.py

Train all datasets

bash ./training_all.sh

Test all datasets

bash ./test_all_global.sh >> ./logs/test_all_global.txt 2>&1 &
bash ./test_all_local.sh >> ./logs/test_all_local.txt 2>&1 &

Dataset and query generation

In the fold of "dataset_dealing", we provide the scripts to download the dataset and generate the query automatically. 

We provide the processed datasets of cora, citeseer and photo as space limit. The other datasets can be generated by the following procedure.

1: make a new folder by "unzip dataset.zip" or "mkdir dataset"
2: get into the dataset folder by "cd dataset"
3: make a folder for each dataset, e.g., "mkdir texas"
4: use the scripts in dataset_dealing to download datasets and generate the query. Note that there are two scripts for each dataset, i.e., "texas_download_pyg.py" or "texas_data.py".
The first one is used for download datasets automatically and the second one is used to generate query automatically. Please put the first one script under the folder of "./dataset/" and put the second script under the folder of "./dataset/dataset_name/", e.g., "./dataset/texas/"
5: python texas_download_pyg.py
6: python texas_data.py

Folder Structure

.
├── dataset                     # make a new folder by "mkdir dataset"
├── dataset_dealing             # the scripts to download datasets and deal datasets automatically
├── logs                        # the running logs
├── model                       # the saved model
├── pretrain_result             # the pretrained latent representation
├── scripts                     # the scripts to run the model and the experiments
├── accuracy_globalsearch.py    # the IESG solver-Global_Binary_Search
├── accuracy_localsearch.py     # the IESG solver-Local_Search
├── data_loader.py              # data loader
├── early_stop.py               # early stop module to alleviate overfitting
├── layer.py                    # the layer in the network
├── link_pretrain.py            # the overall entrance for the model
├── layer.py                    # the layer in the network
├── lr.py                       # the learning rate module
├── model.py                    # the model definition
├── utils.py                    # the utils used
├── test_all_global.sh          # test the performance of all datasets by global binary search
├── test_all_local.sh           # test the performance of all datasets by local search
├── training_all.sh             # the script to train all the models
└── README.md

Citation

@article{wang2024efficient,
  title={Efficient Unsupervised Community Search with Pre-trained Graph Transformer},
  author={Wang, Jianwei and Wang, Kai and Lin, Xuemin and Zhang, Wenjie and Zhang, Ying},
  journal={arXiv preprint arXiv:2403.18869},
  year={2024}
}