PMGT: A Python repository from sdu-zyx

This is the implementation of PMGT described in the paper: "Pre-training Graph Transformer with Multimodal Side Information for Recommendation" (In ACM MM2021).

Environment

To run the code, you need the following dependencies

pre-training:
- Python 3
- TensorFlow-gpu 1.13.1
- graphlearn 1.0.1
downstream:
- Python 3
- PyTorch 1.8.1

Files in The Folder

config_file/ : hyper-parameters;
data/ : data pre-processing & pre-processed file;
down_rec/ : implementation of downstream tasks;
utils/ : optimization and bert modules and so on.

Quick Validation

The pre-trained representations of items in the video game dataset can be downloaded from here. Please move the unzipped files to the folder 'data/video/', and then run the codes of downstream tasks directly.

Testing on Recommendation Task

Using the pre-trained item representations.
$ python run_rec.py --data_type video --pretrain 1 --lr 0.001 --l2_re 0

Using the randomly initialized item representations.
$ python run_rec.py --data_type video --pretrain 0 --lr 0.001 --l2_re 0

Testing on CTR Prediction Task

Using the pre-trained item representations.
$ python run_ctr.py --data_type_video --pretrain 1 --lr 0.001 --l2_re 0.0001

Using the randomly initialized item representations.
$ python run_ctr.py --data_type_video --pretrain 0 --lr 0.001 --l2_re 0.0001

Example of Running The Codes

Data Preprocessing

The experimental datasets are collected from the Amazon Review Datasets.

Video Games
Toys and Games
Tools and Home Improvement

Using the original data to build the pre-training graph dataset and downstream task dataset.
$ python data_process.py

Note that the experimental datasets used in the original paper are processed based on some internal APIs. Thus, there exist some difference between the following experimental statistics and the statistics reported in the original paper.

Statistics of Experimental Datasets

Datasets	Data for Downstream tasks			Item Graph		Threshold
Datasets	# Users	# Items	# Interact.	# Nodes	# Edges	Threshold
VG	27,988	6,551	98,278	7,252	88,606	3
TG	118,153	6,238	294,507	6,451	15,363	4
THI	164,717	5,751	431,455	5,982	12,927	3

Pre-training

Pre-training PMGT

$ python main.py --data_type video --is_train 1

Saving Item Representations Pre-trained by PMGT

$ python main.py --data_type video --is_train 0

Testing on Downstream Tasks

See the detailed in Quick Validation

Experiment Results

Datasets	Methods	Top-N Recommendation
Datasets	Methods	REC-R@10	REC-R@20	REC-N@10	REC-N@20
VG
	NCF	0.1698	0.2510	0.0970	0.1192
	NCF-PMGT	0.2588	0.3518	0.1688	0.1945
TG
	NCF	0.2598	0.3295	0.1942	0.2129
	NCF-PMGT	0.2926	0.3682	0.2194	0.2397
THI
	NCF	0.2687	0.3188	0.2232	0.2367
	NCF-PMGT	0.2909	0.3509	0.2390	0.2552

sdu-zyx/PMGT