/PMGT

Primary LanguagePython

This is the implementation of PMGT described in the paper: "Pre-training Graph Transformer with Multimodal Side Information for Recommendation" (In ACM MM2021).

Environment

To run the code, you need the following dependencies

  • pre-training:

    • Python 3
    • TensorFlow-gpu 1.13.1
    • graphlearn 1.0.1
  • downstream:

    • Python 3
    • PyTorch 1.8.1

Files in The Folder

  • config_file/ : hyper-parameters;
  • data/ : data pre-processing & pre-processed file;
  • down_rec/ : implementation of downstream tasks;
  • utils/ : optimization and bert modules and so on.

Quick Validation

The pre-trained representations of items in the video game dataset can be downloaded from here. Please move the unzipped files to the folder 'data/video/', and then run the codes of downstream tasks directly.

Testing on Recommendation Task

Using the pre-trained item representations.
$ python run_rec.py --data_type video --pretrain 1 --lr 0.001 --l2_re 0

Using the randomly initialized item representations.
$ python run_rec.py --data_type video --pretrain 0 --lr 0.001 --l2_re 0

Testing on CTR Prediction Task

Using the pre-trained item representations.
$ python run_ctr.py --data_type_video --pretrain 1 --lr 0.001 --l2_re 0.0001

Using the randomly initialized item representations.
$ python run_ctr.py --data_type_video --pretrain 0 --lr 0.001 --l2_re 0.0001

Example of Running The Codes

Data Preprocessing

The experimental datasets are collected from the Amazon Review Datasets.

  • Video Games
  • Toys and Games
  • Tools and Home Improvement

Using the original data to build the pre-training graph dataset and downstream task dataset.
$ python data_process.py

Note that the experimental datasets used in the original paper are processed based on some internal APIs. Thus, there exist some difference between the following experimental statistics and the statistics reported in the original paper.

Statistics of Experimental Datasets

Datasets Data for Downstream tasks Item Graph Threshold
# Users # Items # Interact. # Nodes # Edges
VG 27,988 6,551 98,278 7,252 88,606 3
TG 118,153 6,238 294,507 6,451 15,363 4
THI 164,717 5,751 431,455 5,982 12,927 3

Pre-training

Pre-training PMGT

$ python main.py --data_type video --is_train 1

Saving Item Representations Pre-trained by PMGT

$ python main.py --data_type video --is_train 0

Testing on Downstream Tasks

See the detailed in Quick Validation

Experiment Results

Datasets Methods Top-N Recommendation
REC-R@10 REC-R@20 REC-N@10 REC-N@20
VG
NCF 0.1698 0.2510 0.0970 0.1192
NCF-PMGT 0.2588 0.3518 0.1688 0.1945
TG
NCF 0.2598 0.3295 0.1942 0.2129
NCF-PMGT 0.2926 0.3682 0.2194 0.2397
THI
NCF 0.2687 0.3188 0.2232 0.2367
NCF-PMGT 0.2909 0.3509 0.2390 0.2552