
Enquiry on paper "Catastrophic Forgetting in Deep Graph Networks: an Introductory Benchmark for Graph Classification"

I recently came across the paper "Catastrophic Forgetting in Deep Graph Networks: an Introductory Benchmark for Graph Classification" and noticed the linked GitHub repository and attempted to run the corresponding experiments.

However, I ran into some issues as I could not find the corresponding config_LWF_Split_GraphSAGE_OGB.yml file mentioned in the readme and would like to check on this?

I also attempted to re-run the baselines using config_Rehearsal_Split_Baseline.yml and config_LWF_Split_Baseline.yml which resulted in 19.80 and 17.43 respectively.
After modifying the config_Rehearsal_Split_Baseline.yml's argument of n_rehearsal_patterns_per_task to 100, I was able to obtain 26.21 and am unable to reach the result of 42.87.

Additionally, when looking at the winner_config.json for the modified config_Rehearsal_Split_Baseline.yml, I noticed that it selected config 2.
However when investigating the config_result.json(s), I noticed that config 2 had an a validation score of ~24.68 while config 4 had a validation score of ~30.03.
May I check if this is potentially an issue due to me running the code with the --debug command line flag?

I then also ran config_Rehearsal_Split_DGNReg.yml using python launch_experiment.py --config-file CONFIGS/config_Rehearsal_Split_DGNReg.yml --splits-folder SPLITS/ --data-splits SPLITS/CIFAR10/CIFAR10_outer1_inner1.splits --data-root DATA/ --dataset-name CIFAR10 --dataset-class data.dataset.GNNBenchmarkDataset --max-cpus 4 --max-gpus 1 --final-training-runs 5 --result-folder RESULTS_CIFAR10_Rehearsal_DGNReg --debug and obtained 33.28 and am unsure on what would need to be modified to achieve the 46.61 value.

Would appreciate any possible guidance on possible adjustments that could be made to better approach the paper's values and would be glad to provide any more details required.
