[Internal] Performance Consistency Check Leaderboard
Frozenmad opened this issue · 10 comments
This issue is created to check whether the library has the same performance features with the native implemented models.
WARNING: This is not the evaluation results of this library. For benchmarking of AutoGL, please see the examples provided.
Guide to developers
What do we mean when we are checking performance?
First, remember that the performance inconsistency may not be because of our implementations. Sometimes you need to increase the repeat number, or change the range of seeds to see whether the performances match with each other under the "same" setting.
If the rules above do not apply, you need to carefully check whether there are some unwanted implementations in your code. Also, there are still chances that the performance check codes are incorrect, in which case you should point out to @Frozenmad .
Note
All the performance check results are listed below. All the performances inconsistencies are represented as bold in the Table.
[DGL] Homogeneous Node Classification
Starting cmd:
python test/performance/node_classification/dgl/xxx.py --model gcn/gat/sage --repeat 10 --dataset Cora/PubMed/CiteSeer
Environment: Tesla V100S-PCIE-32GB
model | cora | citeseer | pubmed |
---|---|---|---|
base - gcn | 77.93 ~ 1.43 (1.57s/it) | 63.59 ~ 1.81 (1.56s/it) | 75.91 ~ 0.73 (1.67s/it) |
model - gcn | 77.93 ~ 1.43 (1.63s/it) | 63.59 ~ 1.81 (1.70s/it) | 75.91 ~ 0.73 (1.76s/it) |
model (decouple) - gcn | 77.93 ~ 1.43 (1.60s/it) | 63.59 ~ 1.81 (1.58s/it) | 75.91 ~ 0.73 (1.59s/it) |
trainer - gcn | 77.93 ~ 1.43 (1.94s/it) | 63.59 ~ 1.81 (1.96s/it) | 75.91 ~ 0.73 (2.02s/it) |
trainer + dataset - gcn | 77.93 ~ 1.43 (1.97s/it) | 63.59 ~ 1.81 (1.97s/it) | 75.91 ~ 0.73 (1.96s/it) |
solver - gcn | 77.93 ~ 1.43 (2.04s/it) | 63.59 ~ 1.81 (1.99s/it) | 75.91 ~ 0.73 (2.00s/it) |
base - gat | 81.41 ~ 0.80 (2.21s/it) | 67.51 ~ 1.03 (2.29s/it) | 75.55~ 0.91 (2.35s/it) |
model - gat | 81.41 ~ 0.80 (2.39s/it) | 67.51 ~ 1.03 (2.35s/it) | 75.55~ 0.91 (2.53s/it) |
model (decouple) - gat | 81.41 ~ 0.80 (2.20s/it) | 67.51 ~ 1.03 (2.53s/it) | 75.55~ 0.91 (2.38s/it) |
trainer - gat | 81.41 ~ 0.80 (2.83s/it) | 67.51 ~ 1.03 (2.90s/it) | 75.55~ 0.91 (2.94s/it) |
trainer + dataset - gat | 81.41 ~ 0.80 (2.85s/it) | 67.51 ~ 1.03 (2.92s/it) | 75.55~ 0.91 (3.04s/it) |
solver - gat | 81.41 ~ 0.80 (2.95s/it) | 67.51 ~ 1.03 (2.84s/it) | 75.55~ 0.91 (3.05s/it) |
base - sage | 81.23 ~ 0.52 (1.20s/it) | 69.51 ~ 1.12 (1.19s/it) | 76.25 ~ 0.43 (1.27s/it) |
model - sage | 81.23 ~ 0.52 (1.19s/it) | 69.50 ~ 1.14 (1.18s/it) | 76.25 ~ 0.43 (1.27s/it) |
model (decouple) - sage | 81.23 ~ 0.52 (1.19s/it) | 69.50 ~ 1.14 (1.27s/it) | 76.25 ~ 0.43 (1.34s/it) |
trainer - sage | 81.23 ~ 0.52 (1.55s/it) | 69.50 ~ 1.14 (1.58s/it) | 76.25 ~ 0.43 (1.67s/it) |
trainer + dataset - sage | 81.23 ~ 0.52 (1.53s/it) | 69.50 ~ 1.14 (1.58s/it) | 76.25 ~ 0.43 (1.65s/it) |
solver - sage | 81.23 ~ 0.52 (1.57s/it) | 69.50 ~ 1.14 (1.61s/it) | 76.25 ~ 0.43 (1.64s/it) |
[PYG] Homogeneous Node Classification
Starting cmd:
python test/performance/node_classification/pyg/xxx.py --model gcn/gat/sage --repeat 10 --dataset Cora/PubMed/CiteSeer
Environment: Tesla V100S-PCIE-32GB
model | cora | citeseer | pubmed |
---|---|---|---|
base - gcn | 79.92 ~ 0.45 (1.15s/it) | 67.13 ~ 1.71 (1.12s/it) | 76.74 ~ 0.36 (1.12s/it) |
model - gcn | 79.92 ~ 0.45 (1.12s/it) | 67.13 ~ 1.71 (1.13s/it) | 76.74 ~ 0.36 (1.16s/it) |
model (decouple) - gcn | 79.92 ~ 0.45 (1.14s/it) | 67.13 ~ 1.71 (1.16s/it) | 76.74 ~ 0.36 (1.22s/it) |
trainer - gcn | 79.93 ~ 0.45 (1.42s/it) | 67.13 ~ 1.71 (1.43s/it) | 76.74 ~ 0.36 (1.47s/it) |
trainer + dataset - gcn | 79.92 ~ 0.45 (1.43s/it) | 67.13 ~ 1.71 (1.42s/it) | 76.74 ~ 0.36 (1.42s/it) |
solver - gcn | 79.92 ~ 0.45 (1.53s/it) | 67.13 ~ 1.71 (1.60s/it) | 76.74 ~ 0.36 (1.53s/it) |
base - gat | 81.80 ~ 1.24 (1.73s/it) | 70.75 ~ 0.85 (1.94s/it) | 76.65 ~ 1.02 (1.86s/it) |
model - gat | 81.80 ~ 1.24 (1.76s/it) | 70.75 ~ 0.85 (1.82s/it) | 76.65 ~ 1.02 (1.87s/it) |
model (decouple) - gat | 81.80 ~ 1.24 (1.80s/it) | 70.75 ~ 0.85 (1.78s/it) | 76.65 ~ 1.02 (2.05s/it) |
trainer - gat | 81.80 ~ 1.24 (2.31s/it) | 70.75 ~ 0.85 (2.28s/it) | 76.65 ~ 1.02 (2.40s/it) |
trainer + dataset - gat | 81.80 ~ 1.24 (2.30s/it) | 70.75 ~ 0.85 (2.31s/it) | 76.65 ~ 1.02 (2.39s/it) |
solver - gat | 81.80 ~ 1.24 (2.05s/it) | 70.75 ~ 0.85 (2.24s/it) | 76.65 ~ 1.02 (2.33s/it) |
base - sage | 78.21 ~ 0.60 (1.14s/it) | 67.24 ~ 0.99 (1.18s/it) | 75.61 ~ 0.53 (1.34s/it) |
model - sage | 78.21 ~ 0.60 (1.05s/it) | 67.24 ~ 0.99 (1.24s/it) | 75.61 ~ 0.53 (1.35s/it) |
trainer - sage | 78.21 ~ 0.60 (1.24s/it) | 67.24 ~ 0.99 (1.48s/it) | 75.61 ~ 0.53 (1.63s/it) |
trainer + dataset - sage | 78.21 ~ 0.60 (1.24s/it) | 67.24 ~ 0.99 (1.48s/it) | 75.62 ~ 0.51 (1.62s/it) |
solver - sage | 78.21 ~ 0.60 (1.30s/it) | 67.24 ~ 0.99 (1.67s/it) | 75.62 ~ 0.51 (1.77s/it) |
[DGL] Heterogeneous Node Classification
Starting cmd:
python test/performance/node_classification/dgl/hetero_xxx.py --model xxx --repeat 10 --dataset xxx
Environment: [fill this env]
model | ACM | ACM3025 | xxx |
---|---|---|---|
base - hgt | 0.4025 ~ 0.0055 (119.67s/it) | ||
model - hgt | 0.4007 ~ 0.0051 (119.35s/it) | ||
trainer - hgt | 0.3946 ~ 0.0067 (33.49s/it) | ||
trainer + dataset - hgt | |||
solver - hgt | |||
base - heteroRGCN | 0.4033 ~ 0.0013 (16.20s/it) | ||
model - heteroRGCN | 0.4043 ~ 0.0015 (14.90s/it) | ||
trainer - heteroRGCN | 0.3995 ~ 0.0015 (7.20s/it) | ||
trainer + dataset - heteroRGCN | |||
solver - heteroRGCN | |||
base - han | 0.9123 ~ 0.0072 (43.62s/it) | 0.8655 ~ 0.0139 (44.58s/it) | |
model - han | 0.9000 ~ 0.0099 (159.69s/it) | ||
trainer - han | 0.9048 ~ 0.0055 (92.51s/it) | ||
trainer + dataset - han | |||
solver - han |
[PyG] Homogeneous Graph Classification
Starting cmd:
python test/performance/graph_classification/pyg/xxx.py --model gin --repeat 10 --dataset MUTAG/COLLAB/IMDBBINARY
Environment: Tesla V100S-PCIE-32GB
model | MUTAG | COLLAB | IMDBBINARY |
---|---|---|---|
base - gin | 82.31 ~ 8.63 (6.68s/it) | ||
model - gin | 89.23 ~ 4.49 (5.87s/it) | ||
trainer - gin | 80.00 ~ 7.05 (6.14s/it) | ||
trainer + dataset - gin | 76.92 ~ 7.50 (5.42s/it) | ||
solver - gin | 90.38 ~ 6.26 (5.91s/it) |
Environment: Tesla V100S-PCIE-32GB
Considering the randomness in pyg, the results repeated 100 are reported below:
model | MUTAG | COLLAB | IMDBBINARY |
---|---|---|---|
base - gin | 85.27 ~ 6.66 (4.08s/it) | ||
model - gin | 88.77 ~ 5.64 (4.79s/it) | ||
trainer - gin | 81.04 ~ 9.28 (5.41s/it) | ||
trainer + dataset - gin | 80.42 ~ 9.56 (5.24s/it) | ||
solver - gin | 88.69 ~ 7.11 (5.05s/it) |
[PyG] NAS
Environment: GeForce GTX TITAN X
space | algorithm | cora | citeseer | pubmed |
---|---|---|---|---|
graphnas | graphnas | 81.5 | 70.8 | |
graphnas | random | 81.1 | 70.0 | |
singlepath | enas | 82.3 | 69.9 | |
singlepath | darts | 81.9 | 72.2 |
[DGL] Homogeneous Graph Classification
Starting cmd:
python test/performance/graph_classification/dgl/xxx.py --repeat 10 --dataset MUTAG
Environment: Tesla V100S-PCIE-32GB
model | MUTAG | COLLAB | IMDBBINARY |
---|---|---|---|
base - gin | 89.62 ~ 6.21 (14.98s/it) | 72.86 ~ 1.80 (375.67s/it) | 68.80 ~ 2.68 (69.53s/it) |
model - gin | 89.62 ~ 6.21 (15.07s/it) | 72.86 ~ 1.80 (366.30s/it) | 68.80 ~ 2.68 (74.64s/it) |
trainer - gin | 89.62 ~ 6.21 (15.71s/it) | 72.86 ~ 1.80 (396.29s/it) | 68.80 ~ 2.68 (69.95s/it) |
trainer + dataset - gin | 89.62 ~ 6.21 (15.50s/it) | 72.86 ~ 1.80 (482.89s/it) | 68.80 ~ 2.68 (74.36s/it) |
solver - gin | 89.62 ~ 6.21 (15.89s/it) | 72.86 ~ 1.80 (481.45s/it) | 68.80 ~ 2.68 (74.02s/it) |
[PYG] Homogeneous Link Prediction
Starting cmd:
python test/performance/link_prediction/pyg/xxx.py --model gcn/gat/sage --repeat 10 --dataset Cora/PubMed/CiteSeer
Environment: Tesla V100S-PCIE-32GB
model | cora | citeseer | pubmed |
---|---|---|---|
base - gcn | 90.44 ~ 0.91 (2.20s/it) | 90.30 ~ 0.79 (2.29s/it) | 95.48 ~ 0.22 (24.06s/it) |
model - gcn | 90.44 ~ 0.91 (2.24s/it) | 90.30 ~ 0.79 (2.25s/it) | 95.48 ~ 0.22 (22.83s/it) |
model_decouple - gcn | 90.44 ~ 0.91 (2.20s/it) | 90.30 ~ 0.80 (2.26s/it) | 95.48 ~ 0.22 (23.53s/it) |
trainer - gcn | 90.44 ~ 0.91 (2.00s/it) | 90.30 ~ 0.79 (2.00s/it) | 95.48 ~ 0.22 (22.88s/it) |
trainer + dataset - gcn | 90.44 ~ 0.91 (2.01s/it) | 90.30 ~ 0.79 (2.03s/it) | 95.48 ~ 0.22 (24.25s/it) |
solver - gcn | 90.44 ~ 0.93 (2.56s/it) | 90.30 ~ 0.79 (2.61s/it) | 95.48 ~ 0.22 (26.49s/it) |
base - gat | 90.72 ~ 0.79 (2.46s/it) | 90.10 ~ 0.84 (2.62s/it) | 91.72 ~ 0.44 (23.11s/it) |
model - gat | 90.72 ~ 0.79 (2.49s/it) | 90.10 ~ 0.84 (2.54s/it) | 91.72 ~ 0.44 (22.53s/it) |
model_decouple - gat | 90.72 ~ 0.79 (2.44s/it) | 90.10 ~ 0.84 (2.57s/it) | 91.72 ~ 0.44 (23.85s/it) |
trainer - gat | 90.72 ~ 0.79 (2.26s/it) | 90.10 ~ 0.84 (2.43s/it) | 91.72 ~ 0.44 (22.56s/it) |
trainer + dataset - gat | 90.72 ~ 0.79 (2.25s/it) | 90.10 ~ 0.84 (2.38s/it) | 91.72 ~ 0.44 (23.03s/it) |
solver - gat | 90.72 ~ 0.79 (2.71s/it) | 90.10 ~ 0.84 (3.00s/it) | 91.72 ~ 0.44 (26.98s/it) |
base - sage | 88.59 ~ 0.99 (1.98s/it) | 84.44 ~ 1.47 (2.23s/it) | 87.11 ~ 1.20 (22.91s/it) |
model - sage | 88.52 ~ 1.04 (1.99s/it) | 84.44 ~ 1.47 (2.24s/it) | 87.11 ~ 1.20 (22.78s/it) |
model_decouple - sage | 88.53 ~ 1.05 (1.97s/it) | 84.42 ~ 1.46 (2.16s/it) | 87.11 ~ 1.20 (22.44s/it) |
trainer - sage | 88.57 ~ 0.98 (1.92s/it) | 84.43 ~ 1.46 (2.16s/it) | 87.11 ~ 1.20 (21.41s/it) |
trainer + dataset - sage | 88.58 ~ 0.99 (1.91s/it) | 84.42 ~ 1.45 (2.13s/it) | 87.10 ~ 1.20 (23.60s/it) |
solver - sage | 88.58 ~ 0.99 (2.42s/it) | 84.42 ~ 1.45 (2.59s/it) | 87.10 ~ 1.20 (25.82s/it) |
[DGL] Link Prediction
Starting cmd:
python test/performance/link_prediction/dgl/xxx.py --model gcn/gat/sage --repeat 10 --dataset Cora/PubMed/CiteSeer
Environment: Tesla V100S-PCIE-32GB
model | cora | citeseer | pubmed |
---|---|---|---|
base - gcn | 87.44 ~ 1.72 (1.51s/it) | 84.79 ~ 2.24 (1.80s/it) | 91.23 ~ 1.87 (8.17s/it) |
model - gcn | 87.44 ~ 1.72 (1.57s/it) | 84.79 ~ 2.24 (1.72s/it) | 91.23 ~ 1.87 (8.09s/it) |
trainer - gcn | 87.44 ~ 1.72 (2.08s/it) | 84.79 ~ 2.24 (2.83s/it) | 91.23 ~ 1.87 (9.17s/it) |
trainer + dataset - gcn | 87.44 ~ 1.72 (1.71s/it) | 84.79 ~ 2.24 (2.33s/it) | 91.23 ~ 1.87 (9.14s/it) |
solver - gcn | 87.44 ~ 1.72 (1.75s/it) | 84.79 ~ 2.24 (2.46s/it) | 91.23 ~ 1.87 (9.74s/it) |
base - gat | 92.39 ~ 0.40 (1.83s/it) | 91.69 ~ 0.96 (1.98s/it) | 75.55~ 0.91 (8.51s/it) |
model - gat | 92.39 ~ 0.40 (2.02s/it) | 91.69 ~ 0.96 (1.90s/it) | 75.55~ 0.91 (8.53s/it) |
trainer - gat | 92.39 ~ 0.40 (2.47s/it) | 91.69 ~ 0.96 (3.27s/it) | 75.55~ 0.91 (9.39s/it) |
trainer + dataset - gat | 92.39 ~ 0.40 (2.45s/it) | 91.69 ~ 0.96 (3.12s/it) | 75.55~ 0.91 (9.48s/it) |
solver - gat | 92.39 ~ 0.40 (2.13s/it) | 91.69 ~ 0.96 (2.94s/it) | 75.55~ 0.91 (9.36s/it) |
base - sage | 88.49 ~ 0.91 (1.46s/it) | 87.36 ~ 0.74 (1.49s/it) | 76.25 ~ 0.43 (8.06s/it) |
model - sage | 88.49 ~ 0.91(1.49s/it) | 87.36 ~ 0.74 (1.61s/it) | 76.25 ~ 0.43 (8.09s/it) |
trainer - sage | 88.49 ~ 0.91 (1.59s/it) | 87.36 ~ 0.74 (2.58s/it) | 76.25 ~ 0.43 (8.95s/it) |
trainer + dataset - sage | 88.49 ~ 0.91 (1.70s/it) | 87.36 ~ 0.74 (2.51s/it) | 76.25 ~ 0.43 (8.96s/it) |
solver - sage | 88.49 ~ 0.91 (1.51s/it) | 87.36 ~ 0.74 (2.30s/it) | 76.25 ~ 0.43 (9.74s/it) |
[PYG] Robust Model under Mettack
Starting cmd:
python test/performance/robust_model/model.py --model gcn/xxx --repeat 10 --dataset Cora/PubMed/CiteSeer
model | cora | cora | citeseer | citeseer | pubmed | pubmed |
---|---|---|---|---|---|---|
ptb rate | 0 | 5% | 0 | 5% | 0 | 5% |
base - gcn | ||||||
model - gcn | ||||||
base - gcnsvd | ||||||
model - gcnsvd | ||||||
base - gnnjaccard | ||||||
model - gnnjaccard | ||||||
base - robustgcn | ||||||
model - robustgcn | ||||||
base - gnnguard | ||||||
model - gnnguard |
[PYG] Robust Model under Mettack
Starting cmd:
python test/performance/robust_model/model.py --model gcn/xxx --repeat 10 --dataset Cora/PubMed/CiteSeer
model | cora | cora | citeseer | citeseer | pubmed | pubmed |
---|---|---|---|---|---|---|
ptb rate | 0 | 20% | 0 | 20% | 0 | 20% |
base - gcn | 0.8351 ~ 0.003 | 0.7884 ~ 0.008 | 0.7340 ~ 0.006 | 0.6963 ~ 0.011 | 0.8515 ~ 0.0007 | 0.7512 ~ 0.0026 |
model - gcn | 0.8351 ~ 0.003 | 0.7944 ~ 0.005 | 0.7327 ~ 0.008 | 0.7327 ~ 0.008 | 0.8553 ~ 0.0007 | 0.7412 ~ 0.0057 |
base - gnnguard | 0.7810 ~ 0.006 | 0.7660 ~ 0.005 | 0.6900 ~ 0.013 | 0.6940 ~ 0.008 | 0.8538 ~ 0.0049 | 0.8439 ~ 0.0031 |
model - gnnguard | 0.7952 ~ 0.003 | 0.7711 ~ 0.003 | 0.7114 ~ 0.008 | 0.7120 ~ 0.002 | 0.8536 ~ 0.0014 | 0.8428 ~ 0.0010 |
SSL Test
train set performance
Model | MUTAG | PTC-MR | PROTEINS | NCI1 |
---|---|---|---|---|
Ours-GCN | 0.6389 ~ 0.0373 | 0.6382 ~ 0.0188 | 0.7874 ~ 0.0167 | 0.8102 ~ 0.0242 |
Ours-GIN | 0.6278 ~ 0.1407 | 0.6941 ~ 0.0606 | 0.7568 ~ 0.0783 | 0.8107 ~ 0.0384 |
Ours-solver + GCN | 0.8222 ~ 0.0222 | 0.7059 ~ 0.0416 | 0.8018 ~ 0.0221 | 0.9130 ~ 0.0125 |
Ours-solver + GIN | 0.8667 ~ 0.0667 | 0.7059 ~ 0.1162 | 0.7657 ~ 0.0138 | 0.8875 ~ 0.0214 |
valid set performance
Model | MUTAG | PTC-MR | PROTEINS | NCI1 |
---|---|---|---|---|
Ours-GCN | 0.7394 ~ 0.0077 | 0.5817 ~ 0.0098 | 0.7323 ~ 0.0138 | 0.7246 ~ 0.0041 |
Ours-GIN | 0.7735 ~ 0.0274 | 0.5755 ~ 0.0174 | 0.6579 ~ 0.0102 | 0.6956 ~ 0.0095 |
Ours-solver + GCN | 0.8197 ~ 0.0057 | 0.6481 ~ 0.0031 | 0.7677 ~ 0.0008 | 0.7167 ~ 0.0030 |
Ours-solver + GIN | 0.8424 ~ 0.0130 | 0.6266 ~ 0.0114 | 0.7271 ~ 0.0049 | 0.7167 ~ 0.0020 |
test set performance
Model | MUTAG | PTC-MR | PROTEINS | NCI1 |
---|---|---|---|---|
GraphCL | 0.8680 ~ 0.0134 | - | 0.7417 ~ 0.0034 | 0.7463 ~ 0.0025 |
Ours-GCN | 0.8155 ~ 0.0457 | 0.6063 ~ 0.0382 | 0.7362 ~ 0.0238 | 0.7440 ~ 0.0050 |
Ours-GIN | 0.8558 ~ 0.0621 | 0.5133 ~ 0.0245 | 0.7306 ~ 0.0163 | 0.7117 ~ 0.0091 |
Ours-solver + GCN | 0.8600 ~ 0.0330 | 0.5342 ~ 0.0332 | 0.7471 ~ 0.0092 | 0.7217 ~ 0.0113 |
Ours-solver + GIN | 0.8694 ~ 0.0134 | 0.5192 ~ 0.0477 | 0.7231 ~ 0.0144 | 0.7224 ~ 0.0103 |