Cannot reproduce the test accuracy of GRAIN (ball-D)
DAIZHENWEI opened this issue · 4 comments
Hi, I tried to run your code in ./examples/Test.ipynb. But I cannot reproduce the test accuracy of GRAIN (ball-D). The performance is about ~3% lower than that is shown in your paper. Could you check whether the parameters are correct in your file? I attached the figure showing my results.
Thanks for your attention!
The test accuracy of the learning-based GNN model may be lower than validation accuracy, leading to unstable performance. Therefore, we report the mean test accuracy in our work.
Note that the node selection process in Grain is parameter-free,and the unstable performance mainly comes from the training of the GCN model. We have added the early-stopping trick when training in the file early_stop_training.py. For fairness, all the baselines are trained by the same python file since selecting nodes and training the model are two independent stages.
We hope this will solve your problem.
Hi Wt, thank you for making the code public. Based my understanding, the evaluation code in the early stopping.py:
print('xxxxxxxxxx Evaluation begin xxxxxxxxxx')
t_total = time.time()
record = {}
for i in range(500):
model = GCN(nfeat=features_GCN.shape[1],
nhid=hidden_size,
nclass=labels.max().item() + 1,
dropout=0.85)
model.cuda()
early_stopping = EarlyStopping(patience = 10)
optimizer = optim.Adam(model.parameters(),
lr=0.05, weight_decay=5e-4)
for epoch in range(400):
train(epoch,model,record)
if early_stopping.early_stop==True:
breakbit_list = sorted(record.keys())
bit_list.reverse()
for key in bit_list[:10]:
value = record[key]
print(round(key,3),round(value,3))
print('xxxxxxxxxx Evaluation end xxxxxxxxxx')
This seems to show the best top 10 performers among 500 runs instead of average. Is this correct? Thank you.
Hi Wt, thank you for making the code public. Based my understanding, the evaluation code in the early stopping.py:
print('xxxxxxxxxx Evaluation begin xxxxxxxxxx')
t_total = time.time()
record = {}
for i in range(500):
model = GCN(nfeat=features_GCN.shape[1],
nhid=hidden_size,
nclass=labels.max().item() + 1,
dropout=0.85)
model.cuda()
early_stopping = EarlyStopping(patience = 10)
optimizer = optim.Adam(model.parameters(),
lr=0.05, weight_decay=5e-4)
for epoch in range(400):
train(epoch,model,record)
if early_stopping.early_stop==True:
break
bit_list = sorted(record.keys())
bit_list.reverse()
for key in bit_list[:10]:
value = record[key]
print(round(key,3),round(value,3))
print('xxxxxxxxxx Evaluation end xxxxxxxxxx')This seems to show the best top 10 performers among 500 runs instead of average. Is this correct? Thank you.
Thank you for your attention to our work. We do not simply select the best result in validation set. In fact, we select the top 10 validation acc to ensure the stability of early-stopping(we use early-stopping to make sure that the result of validation set and test set can be nearly same, or later will be less than former with a significant gap).
Do not worry about the justice, all the baseline use the same technique(like AGE,ANRMAB). That is why both AGE and ANRMAB seem better than original paper report.
Thank you for your swift response. I was wondering if it is possible to share or upload the generated "GRAIN(ball-D)_r0.05_cora_selected_nodes.json" file to the repository? I got the file by running the Ball.py; just want to make sure I got the correct results. Thanks!