THUMNLab/AutoGL

[Internal] Performance Consistency Check Leaderboard

Frozenmad opened this issue · 10 comments

This issue is created to check whether the library has the same performance features with the native implemented models.

WARNING: This is not the evaluation results of this library. For benchmarking of AutoGL, please see the examples provided.

Guide to developers

What do we mean when we are checking performance?

First, remember that the performance inconsistency may not be because of our implementations. Sometimes you need to increase the repeat number, or change the range of seeds to see whether the performances match with each other under the "same" setting.

If the rules above do not apply, you need to carefully check whether there are some unwanted implementations in your code. Also, there are still chances that the performance check codes are incorrect, in which case you should point out to @Frozenmad .

Note

All the performance check results are listed below. All the performances inconsistencies are represented as bold in the Table.

[DGL] Homogeneous Node Classification

Starting cmd:

python test/performance/node_classification/dgl/xxx.py --model gcn/gat/sage --repeat 10 --dataset Cora/PubMed/CiteSeer

Environment: Tesla V100S-PCIE-32GB

model cora citeseer pubmed
base - gcn 77.93 ~ 1.43 (1.57s/it) 63.59 ~ 1.81 (1.56s/it) 75.91 ~ 0.73 (1.67s/it)
model - gcn 77.93 ~ 1.43 (1.63s/it) 63.59 ~ 1.81 (1.70s/it) 75.91 ~ 0.73 (1.76s/it)
model (decouple) - gcn 77.93 ~ 1.43 (1.60s/it) 63.59 ~ 1.81 (1.58s/it) 75.91 ~ 0.73 (1.59s/it)
trainer - gcn 77.93 ~ 1.43 (1.94s/it) 63.59 ~ 1.81 (1.96s/it) 75.91 ~ 0.73 (2.02s/it)
trainer + dataset - gcn 77.93 ~ 1.43 (1.97s/it) 63.59 ~ 1.81 (1.97s/it) 75.91 ~ 0.73 (1.96s/it)
solver - gcn 77.93 ~ 1.43 (2.04s/it) 63.59 ~ 1.81 (1.99s/it) 75.91 ~ 0.73 (2.00s/it)
base - gat 81.41 ~ 0.80 (2.21s/it) 67.51 ~ 1.03 (2.29s/it) 75.55~ 0.91 (2.35s/it)
model - gat 81.41 ~ 0.80 (2.39s/it) 67.51 ~ 1.03 (2.35s/it) 75.55~ 0.91 (2.53s/it)
model (decouple) - gat 81.41 ~ 0.80 (2.20s/it) 67.51 ~ 1.03 (2.53s/it) 75.55~ 0.91 (2.38s/it)
trainer - gat 81.41 ~ 0.80 (2.83s/it) 67.51 ~ 1.03 (2.90s/it) 75.55~ 0.91 (2.94s/it)
trainer + dataset - gat 81.41 ~ 0.80 (2.85s/it) 67.51 ~ 1.03 (2.92s/it) 75.55~ 0.91 (3.04s/it)
solver - gat 81.41 ~ 0.80 (2.95s/it) 67.51 ~ 1.03 (2.84s/it) 75.55~ 0.91 (3.05s/it)
base - sage 81.23 ~ 0.52 (1.20s/it) 69.51 ~ 1.12 (1.19s/it) 76.25 ~ 0.43 (1.27s/it)
model - sage 81.23 ~ 0.52 (1.19s/it) 69.50 ~ 1.14 (1.18s/it) 76.25 ~ 0.43 (1.27s/it)
model (decouple) - sage 81.23 ~ 0.52 (1.19s/it) 69.50 ~ 1.14 (1.27s/it) 76.25 ~ 0.43 (1.34s/it)
trainer - sage 81.23 ~ 0.52 (1.55s/it) 69.50 ~ 1.14 (1.58s/it) 76.25 ~ 0.43 (1.67s/it)
trainer + dataset - sage 81.23 ~ 0.52 (1.53s/it) 69.50 ~ 1.14 (1.58s/it) 76.25 ~ 0.43 (1.65s/it)
solver - sage 81.23 ~ 0.52 (1.57s/it) 69.50 ~ 1.14 (1.61s/it) 76.25 ~ 0.43 (1.64s/it)

[PYG] Homogeneous Node Classification

Starting cmd:

python test/performance/node_classification/pyg/xxx.py --model gcn/gat/sage --repeat 10 --dataset Cora/PubMed/CiteSeer

Environment: Tesla V100S-PCIE-32GB

model cora citeseer pubmed
base - gcn 79.92 ~ 0.45 (1.15s/it) 67.13 ~ 1.71 (1.12s/it) 76.74 ~ 0.36 (1.12s/it)
model - gcn 79.92 ~ 0.45 (1.12s/it) 67.13 ~ 1.71 (1.13s/it) 76.74 ~ 0.36 (1.16s/it)
model (decouple) - gcn 79.92 ~ 0.45 (1.14s/it) 67.13 ~ 1.71 (1.16s/it) 76.74 ~ 0.36 (1.22s/it)
trainer - gcn 79.93 ~ 0.45 (1.42s/it) 67.13 ~ 1.71 (1.43s/it) 76.74 ~ 0.36 (1.47s/it)
trainer + dataset - gcn 79.92 ~ 0.45 (1.43s/it) 67.13 ~ 1.71 (1.42s/it) 76.74 ~ 0.36 (1.42s/it)
solver - gcn 79.92 ~ 0.45 (1.53s/it) 67.13 ~ 1.71 (1.60s/it) 76.74 ~ 0.36 (1.53s/it)
base - gat 81.80 ~ 1.24 (1.73s/it) 70.75 ~ 0.85 (1.94s/it) 76.65 ~ 1.02 (1.86s/it)
model - gat 81.80 ~ 1.24 (1.76s/it) 70.75 ~ 0.85 (1.82s/it) 76.65 ~ 1.02 (1.87s/it)
model (decouple) - gat 81.80 ~ 1.24 (1.80s/it) 70.75 ~ 0.85 (1.78s/it) 76.65 ~ 1.02 (2.05s/it)
trainer - gat 81.80 ~ 1.24 (2.31s/it) 70.75 ~ 0.85 (2.28s/it) 76.65 ~ 1.02 (2.40s/it)
trainer + dataset - gat 81.80 ~ 1.24 (2.30s/it) 70.75 ~ 0.85 (2.31s/it) 76.65 ~ 1.02 (2.39s/it)
solver - gat 81.80 ~ 1.24 (2.05s/it) 70.75 ~ 0.85 (2.24s/it) 76.65 ~ 1.02 (2.33s/it)
base - sage 78.21 ~ 0.60 (1.14s/it) 67.24 ~ 0.99 (1.18s/it) 75.61 ~ 0.53 (1.34s/it)
model - sage 78.21 ~ 0.60 (1.05s/it) 67.24 ~ 0.99 (1.24s/it) 75.61 ~ 0.53 (1.35s/it)
trainer - sage 78.21 ~ 0.60 (1.24s/it) 67.24 ~ 0.99 (1.48s/it) 75.61 ~ 0.53 (1.63s/it)
trainer + dataset - sage 78.21 ~ 0.60 (1.24s/it) 67.24 ~ 0.99 (1.48s/it) 75.62 ~ 0.51 (1.62s/it)
solver - sage 78.21 ~ 0.60 (1.30s/it) 67.24 ~ 0.99 (1.67s/it) 75.62 ~ 0.51 (1.77s/it)

[DGL] Heterogeneous Node Classification

Starting cmd:

python test/performance/node_classification/dgl/hetero_xxx.py --model xxx --repeat 10 --dataset xxx

Environment: [fill this env]

model ACM ACM3025 xxx
base - hgt 0.4025 ~ 0.0055 (119.67s/it)
model - hgt 0.4007 ~ 0.0051 (119.35s/it)
trainer - hgt 0.3946 ~ 0.0067 (33.49s/it)
trainer + dataset - hgt
solver - hgt
base - heteroRGCN 0.4033 ~ 0.0013 (16.20s/it)
model - heteroRGCN 0.4043 ~ 0.0015 (14.90s/it)
trainer - heteroRGCN 0.3995 ~ 0.0015 (7.20s/it)
trainer + dataset - heteroRGCN
solver - heteroRGCN
base - han 0.9123 ~ 0.0072 (43.62s/it) 0.8655 ~ 0.0139 (44.58s/it)
model - han 0.9000 ~ 0.0099 (159.69s/it)
trainer - han 0.9048 ~ 0.0055 (92.51s/it)
trainer + dataset - han
solver - han

[PyG] Homogeneous Graph Classification

Starting cmd:

python test/performance/graph_classification/pyg/xxx.py --model gin --repeat 10 --dataset MUTAG/COLLAB/IMDBBINARY

Environment: Tesla V100S-PCIE-32GB

model MUTAG COLLAB IMDBBINARY
base - gin 82.31 ~ 8.63 (6.68s/it)
model - gin 89.23 ~ 4.49 (5.87s/it)
trainer - gin 80.00 ~ 7.05 (6.14s/it)
trainer + dataset - gin 76.92 ~ 7.50 (5.42s/it)
solver - gin 90.38 ~ 6.26 (5.91s/it)

Environment: Tesla V100S-PCIE-32GB
Considering the randomness in pyg, the results repeated 100 are reported below:

model MUTAG COLLAB IMDBBINARY
base - gin 85.27 ~ 6.66 (4.08s/it)
model - gin 88.77 ~ 5.64 (4.79s/it)
trainer - gin 81.04 ~ 9.28 (5.41s/it)
trainer + dataset - gin 80.42 ~ 9.56 (5.24s/it)
solver - gin 88.69 ~ 7.11 (5.05s/it)

[PyG] NAS

Environment: GeForce GTX TITAN X

space algorithm cora citeseer pubmed
graphnas graphnas 81.5 70.8
graphnas random 81.1 70.0
singlepath enas 82.3 69.9
singlepath darts 81.9 72.2

[DGL] Homogeneous Graph Classification

Starting cmd:

python test/performance/graph_classification/dgl/xxx.py --repeat 10 --dataset MUTAG

Environment: Tesla V100S-PCIE-32GB

model MUTAG COLLAB IMDBBINARY
base - gin 89.62 ~ 6.21 (14.98s/it) 72.86 ~ 1.80 (375.67s/it) 68.80 ~ 2.68 (69.53s/it)
model - gin 89.62 ~ 6.21 (15.07s/it) 72.86 ~ 1.80 (366.30s/it) 68.80 ~ 2.68 (74.64s/it)
trainer - gin 89.62 ~ 6.21 (15.71s/it) 72.86 ~ 1.80 (396.29s/it) 68.80 ~ 2.68 (69.95s/it)
trainer + dataset - gin 89.62 ~ 6.21 (15.50s/it) 72.86 ~ 1.80 (482.89s/it) 68.80 ~ 2.68 (74.36s/it)
solver - gin 89.62 ~ 6.21 (15.89s/it) 72.86 ~ 1.80 (481.45s/it) 68.80 ~ 2.68 (74.02s/it)

[PYG] Homogeneous Link Prediction

Starting cmd:

python test/performance/link_prediction/pyg/xxx.py --model gcn/gat/sage --repeat 10 --dataset Cora/PubMed/CiteSeer

Environment: Tesla V100S-PCIE-32GB

model cora citeseer pubmed
base - gcn 90.44 ~ 0.91 (2.20s/it) 90.30 ~ 0.79 (2.29s/it) 95.48 ~ 0.22 (24.06s/it)
model - gcn 90.44 ~ 0.91 (2.24s/it) 90.30 ~ 0.79 (2.25s/it) 95.48 ~ 0.22 (22.83s/it)
model_decouple - gcn 90.44 ~ 0.91 (2.20s/it) 90.30 ~ 0.80 (2.26s/it) 95.48 ~ 0.22 (23.53s/it)
trainer - gcn 90.44 ~ 0.91 (2.00s/it) 90.30 ~ 0.79 (2.00s/it) 95.48 ~ 0.22 (22.88s/it)
trainer + dataset - gcn 90.44 ~ 0.91 (2.01s/it) 90.30 ~ 0.79 (2.03s/it) 95.48 ~ 0.22 (24.25s/it)
solver - gcn 90.44 ~ 0.93 (2.56s/it) 90.30 ~ 0.79 (2.61s/it) 95.48 ~ 0.22 (26.49s/it)
base - gat 90.72 ~ 0.79 (2.46s/it) 90.10 ~ 0.84 (2.62s/it) 91.72 ~ 0.44 (23.11s/it)
model - gat 90.72 ~ 0.79 (2.49s/it) 90.10 ~ 0.84 (2.54s/it) 91.72 ~ 0.44 (22.53s/it)
model_decouple - gat 90.72 ~ 0.79 (2.44s/it) 90.10 ~ 0.84 (2.57s/it) 91.72 ~ 0.44 (23.85s/it)
trainer - gat 90.72 ~ 0.79 (2.26s/it) 90.10 ~ 0.84 (2.43s/it) 91.72 ~ 0.44 (22.56s/it)
trainer + dataset - gat 90.72 ~ 0.79 (2.25s/it) 90.10 ~ 0.84 (2.38s/it) 91.72 ~ 0.44 (23.03s/it)
solver - gat 90.72 ~ 0.79 (2.71s/it) 90.10 ~ 0.84 (3.00s/it) 91.72 ~ 0.44 (26.98s/it)
base - sage 88.59 ~ 0.99 (1.98s/it) 84.44 ~ 1.47 (2.23s/it) 87.11 ~ 1.20 (22.91s/it)
model - sage 88.52 ~ 1.04 (1.99s/it) 84.44 ~ 1.47 (2.24s/it) 87.11 ~ 1.20 (22.78s/it)
model_decouple - sage 88.53 ~ 1.05 (1.97s/it) 84.42 ~ 1.46 (2.16s/it) 87.11 ~ 1.20 (22.44s/it)
trainer - sage 88.57 ~ 0.98 (1.92s/it) 84.43 ~ 1.46 (2.16s/it) 87.11 ~ 1.20 (21.41s/it)
trainer + dataset - sage 88.58 ~ 0.99 (1.91s/it) 84.42 ~ 1.45 (2.13s/it) 87.10 ~ 1.20 (23.60s/it)
solver - sage 88.58 ~ 0.99 (2.42s/it) 84.42 ~ 1.45 (2.59s/it) 87.10 ~ 1.20 (25.82s/it)

[DGL] Link Prediction

Starting cmd:

python test/performance/link_prediction/dgl/xxx.py --model gcn/gat/sage --repeat 10 --dataset Cora/PubMed/CiteSeer

Environment: Tesla V100S-PCIE-32GB

model cora citeseer pubmed
base - gcn 87.44 ~ 1.72 (1.51s/it) 84.79 ~ 2.24 (1.80s/it) 91.23 ~ 1.87 (8.17s/it)
model - gcn 87.44 ~ 1.72 (1.57s/it) 84.79 ~ 2.24 (1.72s/it) 91.23 ~ 1.87 (8.09s/it)
trainer - gcn 87.44 ~ 1.72 (2.08s/it) 84.79 ~ 2.24 (2.83s/it) 91.23 ~ 1.87 (9.17s/it)
trainer + dataset - gcn 87.44 ~ 1.72 (1.71s/it) 84.79 ~ 2.24 (2.33s/it) 91.23 ~ 1.87 (9.14s/it)
solver - gcn 87.44 ~ 1.72 (1.75s/it) 84.79 ~ 2.24 (2.46s/it) 91.23 ~ 1.87 (9.74s/it)
base - gat 92.39 ~ 0.40 (1.83s/it) 91.69 ~ 0.96 (1.98s/it) 75.55~ 0.91 (8.51s/it)
model - gat 92.39 ~ 0.40 (2.02s/it) 91.69 ~ 0.96 (1.90s/it) 75.55~ 0.91 (8.53s/it)
trainer - gat 92.39 ~ 0.40 (2.47s/it) 91.69 ~ 0.96 (3.27s/it) 75.55~ 0.91 (9.39s/it)
trainer + dataset - gat 92.39 ~ 0.40 (2.45s/it) 91.69 ~ 0.96 (3.12s/it) 75.55~ 0.91 (9.48s/it)
solver - gat 92.39 ~ 0.40 (2.13s/it) 91.69 ~ 0.96 (2.94s/it) 75.55~ 0.91 (9.36s/it)
base - sage 88.49 ~ 0.91 (1.46s/it) 87.36 ~ 0.74 (1.49s/it) 76.25 ~ 0.43 (8.06s/it)
model - sage 88.49 ~ 0.91(1.49s/it) 87.36 ~ 0.74 (1.61s/it) 76.25 ~ 0.43 (8.09s/it)
trainer - sage 88.49 ~ 0.91 (1.59s/it) 87.36 ~ 0.74 (2.58s/it) 76.25 ~ 0.43 (8.95s/it)
trainer + dataset - sage 88.49 ~ 0.91 (1.70s/it) 87.36 ~ 0.74 (2.51s/it) 76.25 ~ 0.43 (8.96s/it)
solver - sage 88.49 ~ 0.91 (1.51s/it) 87.36 ~ 0.74 (2.30s/it) 76.25 ~ 0.43 (9.74s/it)

[PYG] Robust Model under Mettack

Starting cmd:

python test/performance/robust_model/model.py --model gcn/xxx --repeat 10 --dataset Cora/PubMed/CiteSeer
model cora cora citeseer citeseer pubmed pubmed
ptb rate 0 5% 0 5% 0 5%
base - gcn
model - gcn
base - gcnsvd
model - gcnsvd
base - gnnjaccard
model - gnnjaccard
base - robustgcn
model - robustgcn
base - gnnguard
model - gnnguard

[PYG] Robust Model under Mettack

Starting cmd:

python test/performance/robust_model/model.py --model gcn/xxx --repeat 10 --dataset Cora/PubMed/CiteSeer
model cora cora citeseer citeseer pubmed pubmed
ptb rate 0 20% 0 20% 0 20%
base - gcn 0.8351 ~ 0.003 0.7884 ~ 0.008 0.7340 ~ 0.006 0.6963 ~ 0.011 0.8515 ~ 0.0007 0.7512 ~ 0.0026
model - gcn 0.8351 ~ 0.003 0.7944 ~ 0.005 0.7327 ~ 0.008 0.7327 ~ 0.008 0.8553 ~ 0.0007 0.7412 ~ 0.0057
base - gnnguard 0.7810 ~ 0.006 0.7660 ~ 0.005 0.6900 ~ 0.013 0.6940 ~ 0.008 0.8538 ~ 0.0049 0.8439 ~ 0.0031
model - gnnguard 0.7952 ~ 0.003 0.7711 ~ 0.003 0.7114 ~ 0.008 0.7120 ~ 0.002 0.8536 ~ 0.0014 0.8428 ~ 0.0010

SSL Test

train set performance

Model MUTAG PTC-MR PROTEINS NCI1
Ours-GCN 0.6389 ~ 0.0373 0.6382 ~ 0.0188 0.7874 ~ 0.0167 0.8102 ~ 0.0242
Ours-GIN 0.6278 ~ 0.1407 0.6941 ~ 0.0606 0.7568 ~ 0.0783 0.8107 ~ 0.0384
Ours-solver + GCN 0.8222 ~ 0.0222 0.7059 ~ 0.0416 0.8018 ~ 0.0221 0.9130 ~ 0.0125
Ours-solver + GIN 0.8667 ~ 0.0667 0.7059 ~ 0.1162 0.7657 ~ 0.0138 0.8875 ~ 0.0214

valid set performance

Model MUTAG PTC-MR PROTEINS NCI1
Ours-GCN 0.7394 ~ 0.0077 0.5817 ~ 0.0098 0.7323 ~ 0.0138 0.7246 ~ 0.0041
Ours-GIN 0.7735 ~ 0.0274 0.5755 ~ 0.0174 0.6579 ~ 0.0102 0.6956 ~ 0.0095
Ours-solver + GCN 0.8197 ~ 0.0057 0.6481 ~ 0.0031 0.7677 ~ 0.0008 0.7167 ~ 0.0030
Ours-solver + GIN 0.8424 ~ 0.0130 0.6266 ~ 0.0114 0.7271 ~ 0.0049 0.7167 ~ 0.0020

test set performance

Model MUTAG PTC-MR PROTEINS NCI1
GraphCL 0.8680 ~ 0.0134 - 0.7417 ~ 0.0034 0.7463 ~ 0.0025
Ours-GCN 0.8155 ~ 0.0457 0.6063 ~ 0.0382 0.7362 ~ 0.0238 0.7440 ~ 0.0050
Ours-GIN 0.8558 ~ 0.0621 0.5133 ~ 0.0245 0.7306 ~ 0.0163 0.7117 ~ 0.0091
Ours-solver + GCN 0.8600 ~ 0.0330 0.5342 ~ 0.0332 0.7471 ~ 0.0092 0.7217 ~ 0.0113
Ours-solver + GIN 0.8694 ~ 0.0134 0.5192 ~ 0.0477 0.7231 ~ 0.0144 0.7224 ~ 0.0103