Problem of experiments results on Polblogs dataset

Question

Problem of experiments results on Polblogs dataset

PatPathead opened this issue 3 years ago · 7 comments

Hi, I find a problem with polblogs dataset. I cannot reproduce the experiment results fully under the same random seed. I test them on the metattack model under the perturbation rate 10%, but I find I cannot reproduce all the results consistently.

When I set random seed as 10,
GCN: 0.8680981595092024
RGCN: 0.8629856850715747
ProGNN: 0.82719836400818
ProGNN-fs: 0.8384458077709612

ProGNN, ProGNN-fs are consistent with your paper.

When I set random seed 15,
GCN: 0.7198364008179959
RGCN: 0.7157464212678937
ProGNN:0.7147239263803682
ProGNN-fs: 0.7157464212678937

GCN, RGCN are consistent with your paper.

The parameter setting of ProGNN on polblog dataset, and all code is based on DeepRobust
args.epochs = 1200
args.gamma = 1
args.alpha = 5e-4
args.beta = 1.5
args.lambda_ = 0
args.lr = 5e-4

If I am not wrong, I suppose you run experiments on different random seeds. Could you help me check it when you available?

Thanks in advance!

Answer 1 · 2021-07-11T14:39:13.000Z

Hi,

(1) For Pro-GNN on Polblogs, please see the script in polblogs_meta.sh.

(2) As for GCN, the variance of their performance should not be so large (71.9%-86.8% in your case). I guess you are not using the same data splits for attack and defense for random seed 10. The splits used for attack (metattack) and defense are supposed to be the same; otherwise the defense performance can be very high.

To address this issue, you can use the latest code in train.py by using setting='prognn' to make sure the data splits are the same. See more details here.

Answer 2 · 2021-07-11T14:53:16.000Z

Thanks for your reply. Actually, I notice that in polblogs_meta.sh, you set the random seed as 10. However, you provide the random seed of the attacked Polblogs in DeepRobust is 15. I cut the picture as follows. I think this setting is consistent with my results.

If ProGNN with random seed 10 but run on the attacked graph of random seed 15, I think it will cause the above-mentioned problem.

Answer 3 · 2021-07-11T14:59:51.000Z

I also re-run Metattack by random seed 15, I got the following results
GCN 0.820040899795501,
RGCN 0.8169734151329244
ProGNN 0.9243353783231085

I think it is consistent with my observation. I am not sure you provide the attacked polblogs that actually is poisoned with random seed 10.

Thanks!

Answer 4 · 2021-07-11T16:34:50.000Z

Are you using the latest code? If you use the following code to load the data splits, the splits are always fixed and the same as the ones used in attack.

# data = Dataset(root='/tmp/', name=args.dataset, setting='nettack', seed=15)
data = Dataset(root='/tmp/', name=args.dataset, setting='prognn')

Answer 5 · 2021-07-11T18:05:37.000Z

Hi, I have set it as your instruction

    data = Dataset(root='data/', name=args.dataset, setting='prognn')
    adj, features, labels = data.adj, data.features, data.labels
    idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
    idx_test = idx_test
    idx_unlabeled = np.union1d(idx_val, idx_test)
    idx_all = np.union1d(idx_train,idx_unlabeled)
    adj, features, labels = preprocess(adj, features, labels, preprocess_adj=False)

perturbed_data = PrePtbDataset(root='data/',name=args.dataset,attack_method='meta', ptb_rate=args.ptb_rate)
modified_adj = torch.FloatTensor(perturbed_data.adj.todense())
modified_features = features
print('download successfully!')

I run three times and the results are not too much different than that observed before

GCN:0.7208588957055214
RGCN: 0.7085889570552147,
ProGNN: 0.7668711656441719,

GCN:0.7361963190184049,
RGCN: 0.7147239263803682,
ProGNN: 0.7269938650306749,

GCN:0.7137014314928425,
RGCN:0.7075664621676893,
ProGNN: 0.7269938650306749,

Also, the split that prognn provided is not under random seed 15, and I suppose it maybe not consistent with the attacked version....

Thank!

Answer 6 · 2021-07-11T20:27:36.000Z

Hi, first I want to point out that

If we use data = Dataset(root='data/', name=args.dataset, setting='prognn') to load the data, the given random seed does not affect the loaded data splits.

I just ran several seeds and found GCN achieved accuracy of around 69% on 15% meta polblogs (ProGNN around 0.85). So I think you are still not loading the data splits correctly (check if your folder has the file polblogs_prognn_splits.json). I would suggest you first reinstall deeprobust:

git clone https://github.com/DSE-MSU/DeepRobust.git
cd DeepRobust
python setup.py install

Then create a new folder to clone the newest Pro-GNN

git clone https://github.com/ChandlerBang/Pro-GNN.git
cd Pro-GNN
sh scripts/meta/cora_meta.sh
sh scripts/meta/gcn.sh

Let me know if you have any other questions.

Answer 7 · 2021-07-12T02:48:37.000Z

Thanks for your patience! I have solved this problem.