Reproducing results on Deezer dataset

Question

Reproducing results on Deezer dataset

Chenhui1016 opened this issue 2 years ago · 5 comments

Hi there,

Thanks for the fantastic work!

I'm running run.sh and find out that the metric for Deezer is set to be "rocauc", while the paper uses accuracy as the metric on Deezer (shown in Figure 2). When I change the metric from "rocauc" to "acc" in run.sh, the averaged accuracy is 65.12%, which is much lower than the accuracy reported in the paper (~71%). Could you kindly let me know the proper hyperparameter setting for reproducing the results on Deezer? Thanks in advance!

Answer 1 · 2023-01-20T03:44:25.000Z

Hi Chenhui,

The metric we used for Deezer is rocauc (for binary classification on this dataset) and the results can be reproduced using our provided hyper-parameters in run.sh. After double check, the plotted scores in Figure 2 for Deezer are indeed rocauc and correct, and we have fixed the name on the y-axis of Deezer in a new version uploaded to ResearchGate. Sorry for the confusion.

Answer 2 · 2023-01-20T18:00:52.000Z

Thanks for your prompt reply! I can reproduce NodeFormer's rocauc score on Deezer.

Btw, I just got the rocauc of 72.42 ± 0.38% on ogbn-proteins by using the hyperparameters in run.sh, which seems to be lower than 77.45 ± 1.15% reported in the paper. Could you kindly check those default hyperparameters on ogbn-proteins in run.sh? Thanks again for your help!

Answer 3 · 2023-01-21T11:24:47.000Z

I just checked for this, and we indeed use the hyper-parameters in run.sh for ogbn-proteins. We run the model on a RTX 2080Ti, and can reproduce the result. Can you tell me what GPU is used for your experiment?

Answer 4 · 2023-01-21T20:16:57.000Z

I'm using RTX A6000. Got it, thx!

Answer 5 · 2023-02-24T09:13:43.000Z

Hi Chenhui,

I have added the model checkpoints and code for testing on ogbn-proteins, which may help to make it more convenient to reproduce the score (77+) in our paper if needed for further research.

The reason you achieved the score of 72 could be that your installed packages are inconsistent with ours (see the requirements.txt for details), and maybe you can check for that if you need to reproduce the results with training from scratch