Applying Relational-GCN for heterogenous datasets like Amazon, IMDB, DBLP, ACM.
Paper: https://arxiv.org/abs/1703.06103
Popular datasets include Amazon, DBLP, IMDb and ACM.
python3 entity_classify.py -d [DATASET] -e [EPOCHS] --testing --gpu [CPU/ GPU]
where DATASET = {amazon, imdb, dplp, imdb}, gpu = 0(CPU), -1(GPU)
python3 entity_classify.py -d amazon --testing --gpu 0
- ACM
author paper Subject Paper-Author Paper-Subject Features Train Val Test 5,912 3,025 57 9,936 3,025 1,902 600 300 2,125 - IMDb
Movie Actor Director Movie-Actor Movie-Director Train Val Test 4,780 5,841 2,269 14,340 4,780 300 300 2,687 - DBLP
author paper Conf Venue Paper-Author Paper-Conf Paper-Term Train Val Test 4,057 14,328 20 8,789 19,645 14,328 88,420 800 400 2,857 - Fraud Amazon Dataset
The Amazon dataset includes product reviews under the Musical Instruments category. Users with more than 80% helpful votes are labelled as benign entities and users with less than 20% helpful votes are labelled as fraudulent entities. A fraudulent user detection task can be conducted on the Amazon dataset, which is a binary classification task. 25 handcrafted features from are taken as the raw node features .Users are nodes in the graph, and three relations are: 1. U-P-U : it connects users reviewing at least one same product 2. U-S-U : it connects users having at least one same star rating within one week 3. U-V-U : it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.
Nodes U-P-U U-S-U U-V-U Positive (fraudulent) Negative (benign) Unlabeled 11,944 351,216 7,132,958 2,073,474 821 7,818 3,305