/Graph-RSs-Reproducibility

Accepted as reproducibility paper at RecSys 2023.

Primary LanguagePython

Graph-RSs-Reproducibility

This is the official repository for the paper "Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis", accepted at RecSys 2023 (Reproducibility Track).

This repository is heavily dependent on the framework Elliot, so we suggest you refer to the official GitHub page and documentation.

Pre-requisites

We implemented and tested our models in PyTorch==1.12.0, with CUDA 10.2 and cuDNN 8.0. Additionally, some of graph-based models require PyTorch Geometric, which is compatible with the versions of CUDA and PyTorch we indicated above.

Installation guidelines: scenario #1

If you have the possibility to install CUDA on your workstation (i.e., 10.2), you may create the virtual environment with the requirement files we included in the repository, as follows:

# PYTORCH ENVIRONMENT (CUDA 10.2, cuDNN 8.0)

$ python3.8 -m venv venv
$ source venv/bin/activate
$ pip install --upgrade pip
$ pip install -r requirements.txt
$ pip install -r requirements_torch_geometric.txt

Installation guidelines: scenario #2

A more convenient way of running experiments is to instantiate a docker container having CUDA 10.2 already installed. Make sure you have Docker and NVIDIA Container Toolkit installed on your machine (you may refer to this guide). Then, you may use the following Docker image to instantiate the container equipped with CUDA 10.2 and cuDNN 8.0: link.

Datasets

Reproducibility datasets

We used Gowalla, Yelp 2018, and Amazon Book datasets. The original links may be found here, where the train/test splitting has already been provided:

After downloading, create three folders ./data/{dataset_name}, one for each dataset. Then, run the script ./map_dataset.py, by changing the name of the dataset within the script itself. It will generate the train/test files for each dataset in a format compatible for Elliot (i.e., tsv file with three columns referring to user/item).

In case, we also provide the final tsv files for all the datasets in this repo.

Additional datasets

We directly provide the train/validation/test splittings for Allrecipes and BookCrossing in this repo. As already stated for Gowalla, Yelp 2018, and Amazon Book, create one folder for each dataset in ./data/{dataset_name}.

Results

Replication of prior results (RQ1)

To reproduce the results reported in Table 3, run the following:

$ CUBLAS_WORKSPACE_CONFIG=:4096:8 python3.8 -u start_experiments.py \
$ --dataset {dataset_name} \
$ --model {model_name} 

Note that CUBLAS_WORKSPACE_CONFIG=:4096:8 (which may change depending on your configuration) is needed to ensure the complete reproducibility of the experiments (otherwise, PyTorch may run some operations in their non-deterministic version).

The following table provides links to the specific configuration of hyper-parameters we adopted for each graph-based model (derived from the original papers and/or the codes):

Gowalla Yelp 2018 Amazon Book
NGCF link link link
DGCF link link link
LightGCN link link link
SGL --- link link
UltraGCN link link link
GFCF link link link

Benchmarking graph CF approaches using alternative baselines (RQ2)

In addition to the graph-based models from above, we train and test four classic (and strong) CF baselines. We also provide pointers to their configuration files with the exploration of hyper-parameters, which can be used to reproduce Table 4. We recall that EASER configuration file is not provided at submission time for Amazon Book due to heavy computational costs.

Gowalla Yelp 2018 Amazon Book
MostPop link link link
Random link link link
UserkNN link link link
ItemkNN link link link
RP3Beta link link link
EASER link link ---

The best hyper-parameters for each classic CF model (as found in our experiments) are reported in the following:

  • Gowalla

    • UserkNN: neighbors: 146.0, similarity: 'cosine'
    • ItemkNN: neighbors: 508.0, similarity: 'dot'
    • Rp3Beta: neighborhood: 777.0, alpha: 0.5663562161452378, beta: 0.001085447926739258, normalize_similarity: True
    • EASER: l2_norm: 15.930101258108873
  • Yelp 2018

    • UserkNN: neighbors: 146.0, similarity: 'cosine'
    • ItemkNN: neighbors: 144.0, similarity: 'cosine'
    • Rp3Beta: neighborhood: 342.0, alpha: 0.7681732734954694, beta: 0.4181395996963926, normalize_similarity: True
    • EASER: l2_norm: 212.98774633994572
  • Amazon Book

    • UserkNN: neighbors: 146.0, similarity: 'cosine'
    • ItemkNN: neighbors: 125.0, similarity: 'cosine'
    • Rp3Beta: neighborhood: 496.0, alpha: 0.44477903655656115, beta: 0.5968193614337285, normalize_similarity: True
    • EASER: N.A.

Extending the experimental comparison to new datasets (RQ3 — RQ4)

We report the configuration files (with hyper-parameter search spaces) for each model/dataset pair for classic + graph CF baselines and Allrecipes and BookCrossing.

Allrecipes BookCrossing
MostPop link link
Random link link
UserkNN link link
ItemkNN link link
RP3Beta link link
EASER link link
NGCF link link
DGCF link link
LightGCN link link
SGL link link
UltraGCN link link
GFCF link link

The best hyper-parameters for each classic + graph CF model (as found in our experiments) are reported in the following:

  • Allrecipes

    • UserkNN: neighbors: 863.0, similarity: 'cosine'
    • ItemkNN: neighbors: 508.0, similarity: 'dot'
    • RP3Beta: neighborhood: 777.0, alpha: 0.5663562161452378, beta: 0.001085447926739258, normalize_similarity: True
    • EASER: l2_norm: 555344.9240485814
    • NGCF: lr: 0.0010492631473907471, epochs: 100, factors: 64, batch_size: 128, l_w: 0.08623551848300251, n_layers: 1, weight_size: 64, node_dropout: 0.5704755544541924, message_dropout: 0.37665593943318876, normalize: True
    • DGCF: lr: 0.000313132757493385, epochs: 10, factors: 64, batch_size: 256, l_w_bpr: 3.3519512293075625e-05, l_w_ind: 0.00021537560246909769, n_layers: 2, routing_iterations: 2, intents: 4
    • LightGCN: lr: 0.001, epochs: 10, factors: 64, batch_size: 256, l_w: 0.001288395174690605, n_layers: 4, normalize: True
    • SGL: lr: 0.001, epochs: 10, batch_size: 128, factors: 64, l_w: 1e-4, n_layers: 3, ssl_temp: 0.6492261261178492, ssl_reg: 0.012429441724966553, ssl_ratio: 0.2618285305261178492, sampling: nd
    • UltraGCN: lr: 1e-4, epochs: 240, factors: 64, batch_size: 128, g: 1e-4, l: 0.6421380210212072, w1: 0.026431283275666788, w2: 0.0006086626045670742, w3: 2.3712235041563928e-07, w4: 0.03156224646525972, ii_n_n: 10, n_n: 300, n_w: 300, s_s_p: False, i_w: 1e-4
    • GFCF: svd_factors: 256, alpha: 0.5477395514607551
  • BookCrossing

    • UserkNN: neighbors: 360.0, similarity: 'cosine'
    • ItemkNN: neighbors: 125.0, similarity: 'cosine'
    • RP3Beta: neighborhood: 777.0, alpha: 0.5663562161452378, beta: 0.001085447926739258, normalize_similarity: True
    • EASER: l2_norm: 97.97026620421359
    • NGCF: lr: 0.001313040990458504, epochs: 150, factors: 64, batch_size: 128, l_w: 0.007471352712353916, n_layers: 1, weight_size: 64, node_dropout: 0.6222126221705062, message_dropout: 0.2768938386628866, normalize: True
    • DGCF: lr: 0.00033659666428326467, epochs: 112, factors: 64, batch_size: 1024, l_w_bpr: 0.0005015002430942853, l_w_ind: 1.0625908485203885e-05, n_layers: 1, routing_iterations: 2, intents: 4
    • LightGCN: lr: 0.001, epochs: 160, factors: 64, batch_size: 256, l_w: 0.00128839517469060, n_layers: 4, normalize: True
    • SGL: lr: 0.001, epochs: 10, batch_size: 128, factors: 64, l_w: 1e-4, n_layers: 4, ssl_temp: 0.3831504020789032, ssl_reg: 0.14847461762325737, ssl_ratio: 0.18119634034037221, sampling: rw
    • UltraGCN: lr: 1e-4, epochs: 205, factors: 64, batch_size: 128, g: 1e-4, l: 2.1590977284940767, w1: 0.4071845141372458, w2: 2.674735729193082e-06, w3: 0.11655266791027195, w4: 0.05001575677944944, ii_n_n: 10, n_n: 300, n_w: 300, s_s_p: False, i_w: 1e-4
    • GFCF: svd_factors: 64, alpha: 0.4240013631942601

For RQ4, you need to generate the tsv files where each user from the training set is assigned one of the four quartiles. To do so, run the script ./quartiles_characteristics.py, by changing the name of the dataset inside the script accordingly. This will create (for each dataset) 3 tsv files, one for each hop (i.e., 1-hop, 2-hop, 3-hop). In case, we directly provide such files for your convenience in the same folders of Allrecipes and BookCrossing (see above).

Once all models have been trained, and tsv files for the user groups have been downloaded and correctly placed, you may want to generate the recommendation lists ONLY for the best hyper-parameter configuration for each model/dataset pair. This is done by setting the parameter save_recs: True in each configuration file.

Now, we are all set to calculate the nDCG on each user group. To do so, run the following script:

$ python3.8 -u start_experiments_user_groups.py \
$ --dataset {dataset_name} \
$ --hop {hop_number}