Weird results on CGE@CGQA
Closed this issue · 6 comments
Hi Authors,
Thank you for your great work! I tried to reproduce the results of CEG on the CGQA dataset by running "python train.py --config configs/cge/cgqa.yaml", however, the auc on validation after running 1k+ epochs is still very low (~3e-5, far from that reported in the paper.). I just cloned the repo and set the dataset as instructed.
May I know if there is anything that I should do to successfully reproduce the results?
logs.csv
Attached is the log
did you solve this issue? i have the same problem with GCN code. Even with gcn layers set to 2. highest accuracy achieved is 2.6e-3.
Unfortunately no..
Hello @daoyuan98 and @gulzainali98. I tried to replicate the issue in the past but I did not manage to and, due to upcoming deadlines, we did not have the time to look into that. I will try to replicate it and let you know as I find a solution.
Hello @mancinimassimiliano, thank you for the quick response. If you can share your config file, it'd be great. I noticed that gcn_nlayers was set to 10 in the original code https://github.com/ExplainableML/czsl/blob/main/configs/cge/cgqa.yml#L24. I believe the best performing gcn was 2 layers in the paper. Maybe there are some other parameters that are not correct in the code leading to deviating results from the paper.
Hello @gulzainali98, @daoyuan98.
I found I did a naive mistake when I updated the CGQA dataset: I forgot to update the relative graph embeddings/connections. I updated the code and now CGE can automatically construct the graph given an input dataset (so to reply also to #5).
I got performance for CGE on CGQA in line with what was reported. Could you please check if the updates fix the problem for you as well? (please, use the version of the code with the very last commit, I realized not all changes were pushed).
p.s. @daoyuan98, I apologize for taking so long: I had the correct graph in my local paths and this is why I didn't spot the problem when I first tried.
p.p.s. @gulzainali98, Thank you for spotting the "10" layers. That was actually an unused flag (active only for another type of GCN, GCNII). I took this update to remove unused flags from the configs as well.
Hello @mancinimassimiliano, Thank you for making the changes. I can confirm that by integrating the changes in your latest commit into my code, I have been able to get a maximum test AUC of around 4.0 (model taken from maximum validation set AUC) that is close to the one mentioned in the paper.