This is our implementation for the paper (arXiv):
CLARE: A Semi-supervised Community Detection Algorithm
This repository contains the following contents:
.
├── Locator --> (The folder containing Community Locator source code)
├── Rewriter --> (The folder containing Community Rewriter source code)
├── ckpts --> (The folder saving checkpoint files)
├── dataset --> (The folder containing 7 used datasets)
├── run.py --> (The main code file. The code is run through this file)
└── utils --> (The folder containing utils functions)
Note that you have to create a ckpts
folder to save contents.
Raw datasets are available at SNAP(http://snap.stanford.edu/data/index.html) and pre-processing details are explained in our paper.
We select LiveJournal, DBLP and Amazon, in the Networks with ground-truth communities part.
We provide 7 datasets as below. Each of them contains a community file {name}-1.90.cmty.txt
and an edge file {name}-1.90.ungraph.txt
.
├── dataset
│ ├── amazon
│ ├── amazon_dblp
│ ├── dblp
│ ├── dblp_amazon
│ ├── dblp_lj
│ ├── lj
│ ├── lj_dblp
-
You need to set up the environment for running the experiments (Python 3.7 or above)
-
Install Pytorch with version 1.8.0 or later
-
Install torch-geometric package with version 2.0.1
Note that it may need to appropriately install the package
torch-geometric
based on the CUDA version (or CPU version if GPU is not available). Please refer to the official website https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html for more information of installing prerequisites.For example (Mac / CPU)
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.9.0+cpu.html
-
Install Deepsnap with version 0.2.1
pip install deepsnap
Execute the run.py
file
python run.py --dataset=amazon --conv_type=GCN
Main arguments:
--dataset [amazon, dblp, lj, amazon_dblp, dblp_amazon, dblp_lj, lj_dblp]: the dataset to run
--conv_type [GCN, GIN, SAGE]: GNN type in Community Locator
--n_layers: ego-net dimensions & number of GNN layers
--pred_size: total number of predicted communities
--agent_lr: the learning rate of Community Rewriter
For more argument options, please refer to run.py
Please cite our paper if you make advantage of the CLARE in your own work:
@inproceedings{wu2022clare,
title={CLARE: A Semi-supervised Community Detection Algorithm},
author={Wu, Xixi and Xiong, Yun and Zhang, Yao and Jiao, Yizhu and Shan, Caihua and Sun, Yiheng and Zhu, Yangyong and Philip S. Yu},
booktitle={Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
year={2022},
organization={ACM}
}