/Relational-GCN

Applying Relational-GCN for heterogenous datasets like Amazon, IMDB, DBLP, ACM.

Primary LanguagePython

Relational-GCN (RGCN)

Applying Relational-GCN for heterogenous datasets like Amazon, IMDB, DBLP, ACM.
Paper: https://arxiv.org/abs/1703.06103
Popular datasets include Amazon, DBLP, IMDb and ACM.

Node Classification Task

python3 entity_classify.py -d [DATASET] -e [EPOCHS] --testing --gpu [CPU/ GPU]

where DATASET = {amazon, imdb, dplp, imdb}, gpu = 0(CPU), -1(GPU)

Example for running RGCN on Amazon data

python3 entity_classify.py -d amazon --testing --gpu 0 

Dataset Statistics

  • ACM
    author paper Subject Paper-Author Paper-Subject Features Train Val Test
    5,912 3,025 57 9,936 3,025 1,902 600 300 2,125
  • IMDb
    Movie Actor Director Movie-Actor Movie-Director Train Val Test
    4,780 5,841 2,269 14,340 4,780 300 300 2,687
  • DBLP
    author paper Conf Venue Paper-Author Paper-Conf Paper-Term Train Val Test
    4,057 14,328 20 8,789 19,645 14,328 88,420 800 400 2,857
  • Fraud Amazon Dataset
    The Amazon dataset includes product reviews under the Musical Instruments category. Users with more than 80% helpful votes are labelled as benign entities and users with less than 20% helpful votes are labelled as fraudulent entities. A fraudulent user detection task can be conducted on the Amazon dataset, which is a binary classification task. 25 handcrafted features from are taken as the raw node features .

    Users are nodes in the graph, and three relations are: 1. U-P-U : it connects users reviewing at least one same product 2. U-S-U : it connects users having at least one same star rating within one week 3. U-V-U : it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.

    Nodes U-P-U U-S-U U-V-U Positive (fraudulent) Negative (benign) Unlabeled
    11,944 351,216 7,132,958 2,073,474 821 7,818 3,305