Protein Representation Learning Leaderboard

If you find any related paper, please kindly let us know. We will keep updating the page. Thanks for your valuable contribution.

Supervised Method

NO. Method EC GO-BP GO-MF GO-CC Fold-Fold Fold-Superfamily Fold-Family Reaction
Fmax AUPR Fmax AUPR Fmax AUPR Fmax AUPR Accuracy
1 CNN 0.545 0.526 0.244 0.159 0.354 0.351 0.287 0.204 0.113 0.134 0.534 0.517
2 ResNet 0.605 0.590 0.280 0.205 0.405 0.434 0.304 0.214 0.101 0.072 0.235 0.241
3 LSTM 0.425 0.414 0.225 0.156 0.321 0.334 0.283 0.192 0.064 0.043 0.181 0.110
4 Transformer 0.238 0.218 0.264 0.156 0.211 0.177 0.405 0.210 0.092 0.088 0.404 0.266
5 GCN 0.320 0.319 0.252 0.136 0.195 0.147 0.329 0.175 0.168 0.213 0.828 0.673
6 GAT 0.368 0.320 0.284 0.171 0.317 0.319 0.385 0.249 0.124 0.165 0.727 0.556
7 GVP 0.489 0.482 0.326 0.224 0.426 0.458 0.420 0.279 0.160 0.225 0.838 0.655
8 3DCNN_MQA 0.077 0.029 0.240 0.132 0.147 0.075 0.305 0.144 0.316 0.454 0.925 0.722
9 GraphQA 0.509 0.543 0.308 0.199 0.329 0.347 0.413 0.256 0.237 0.325 0.844 0.608
10 IEConv-I - - - - - - - - 0.450 0.697 0.989 0.872
11 IEConv-II 0.735 0.775 0.374 0.273 0.544 0.572 0.444 0.316 0.476 0.702 0.992 0.872
12 GearNet 0.730 0.751 0.356 0.211 0.503 0.490 0.414 0.276 0.284 0.426 0.953 0.794
13 GearNet-IEConv 0.800 0.835 0.381 0.231 0.563 0.547 0.422 0.259 0.423 0.641 0.991 0.837
14 GearNet-Edge 0.810 0.872 0.403 0.251 0.580 0.570 0.450 0.303 0.440 0.667 0.991 0.866
15 GearNet-Edge-IEConv 0.810 0.843 0.400 0.244 0.581 0.561 0.430 0.284 0.483 0.703 0.995 0.853
16 ProNet-Amino-Acid - - - - - - - - 0.515 0.699 0.990 0.860
17 ProNet-Backbone - - - - - - - - 0.527 0.703 0.993 0.864
18 ProNet-All-Atom - - - - - - - - 0.521 0.690 0.990 0.856
19 CDConv 0.820 - 0.453 - 0.654 - 0.479 - 0.567 0.777 0.996 0.885

Pretrained Method

NO. Method Pre-training Dataset EC GO-BP GO-MF GO-CC Fold-Fold Fold-Superfamily Fold-Family Reaction
Fmax AUPR Fmax AUPR Fmax AUPR Fmax AUPR Accuracy
1 DeepFRI Pfam (10 M) 0.631 0.547 0.399 0.282 0.465 0.462 0.460 0.363 0.153 0.206 0.732 0.633
2 ESM-1b Unifef 50 (24 M) 0.864 0.889 0.470 0.343 0.657 0.639 0.488 0.384 0.268 0.601 0.978 0.831
3 ProtBert-BFD BFD (2.1 B) 0.838 0.859 0.279 0.188 0.456 0.464 0.408 0.234 0.266 0.558 0.976 0.722
4 LM-GVP Unifef100 (216 M) 0.664 0.710 0.417 0.302 0.545 0.580 0.527 0.423 - - - -
5 IEConv-II PDB (476 K) - - - - - - - - 0.503 0.806 0.997 0.876
6 MT-LSTM Unifef90 (76 M) 0.817 0.851 0.442 0.324 0.591 0.608 0.492 0.381 - - - -
7 GearNet-Edge (Multiview Contrast) AlphaFoldDB (805 K) 0.874 0.982 0.490 0.292 0.654 0.596 0.488 0.336 0.541 0.805 0.999 0.875
8 GearNet-Edge (Residue Type Prediction) AlphaFoldDB (805 K) 0.843 0.870 0.430 0.267 0.604 0.583 0.465 0.311 0.488 0.710 0.994 0.866
9 GearNet-Edge (Distance Prediction) AlphaFoldDB (805 K) 0.839 0.863 0.448 0.274 0.616 0.586 0.464 0.327 0.509 0.735 0.994 0.875
10 GearNet-Edge (Angle Prediction) AlphaFoldDB (805 K) 0.853 0.880 0.458 0.291 0.625 0.603 0.473 0.331 0.565 0.763 0.996 0.868
11 GearNet-Edge (Dihedral Prediction) AlphaFoldDB (805 K) 0.859 0.881 0.458 0.304 0.626 0.603 0.465 0.338 0.518 0.778 0.996 0.870
12 PromptProtein UniRef50 + PDB + STRING (89 M) 0.888 0.915 0.495 0.363 0.677 0.665 0.551 0.457 - - - -
13 ESM-GearNet AlphaFoldDB (805 K) 0.894 0.907 0.516 0.301 0.684 0.621 0.506 0.359 - - - -

Dataset

There are many types of protein datasets available. Most existing datasets are derived from processed PDB files of the original proteins. Currently, CDConv and other methods provide processed datasets (⚠️ note that they are not compatible) that can be downloaded as needed:

  1. Enzyme Commission (EC): Processed By CDConv, Processed By GearNet
  2. Gene Ontology (GO, including GO-BP, GO-MF, GO-CC): Processed By CDConv, Processed By GearNet
  3. Protein Fold (including Fold, Family, Superfamily): Processed By CDConv, Processed By IEConv
  4. Enzyme Reaction (Reaction) : Processed By CDConv, Processed By IEConv