VGATM

This is the tensorflow implementation of KDD-2022 paper "Variational Graph Author Topic Modeling" by Delvin Ce Zhang and Hady W. Lauw.

VGATM is a Grapn Neural Network model that extracts interpretable topics for documents with authors and venues. Topics of documents then fulfill document classification, citation prediction, etc.

Implementation Environment

Python == 3.6
tensorflow == 1.9.0
numpy == 1.17.4
sklearn == 0.23.2
scipy == 1.5.2

Run

python main.py -div kl -p gaussian # VGATM-G (unsupervised)
python main.py -div kl -p dirichlet # VGATM-D (unsupervised)
python main.py -div wasserstein -p gaussian # VGATM-W (unsupervised)
python main.py -div kl -p gaussian -sup 1 # VGATM-G (supervised)
python main.py -div kl -p dirichlet -sup 1 # VGATM-D (supervised)
python main.py -div wasserstein -p gaussian -sup 1 # VGATM-W (supervised)

Parameter Setting

-ne: number of training epochs, default=15
-lr: learning rate, default=0.01
-ms: minibatch size, default=128
-dn: dataset name, ml or pl, default=ml
-nt: number of topics, default=64
-sup: label supervision, default=0 (no supervision)
-tr: training ratio of documents, default=0.8
-nn: number of sampled neighbours for convolution, default=5
-ws: word-word graph sliding window size, default=5
-wn: word-word graph number of neighbours for each word, default=5
-nl: number of convolutional steps L, default=2
-div: variational divergence metric R, kl or wasserstein, default=wasserstein
-p: predefined prior distribution p(.), gaussian or dirichlet, default=gaussian
-reg_div: hyperparameter \lambda_reg controlling variational divergence term, default=0.01
-reg_l2: hyperparameter for l2 regularization, default=1e-3
-kp: dropout keep probability, default=0.9
-ap: author prediction, default=0 (no author prediction)
-rs: random seed
-gpu: gpu

Output

Results will be saved to ./results file.

doc_topic_dist_training.txt contains topic proportions of training documents.
doc_topic_dist_test.txt contains topic proportions of test documents.
word_topic_dist.txt contains topic proportions of words.
author_topic_dist.txt contains topic proportions of authors.

Reference

If you use our paper, including code and data, please cite

@inproceedings{vgatm,
  title={Variational Graph Author Topic Modeling},
  author={Zhang, Delvin Ce and Lauw, Hady W},
  booktitle={Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  pages={2429--2438},
  year={2022}
}

cezhang01/vgatm

VGATM

Implementation Environment

Run

Parameter Setting

Output

Reference