awesome-protein-representation-learning

PRs Welcome Awesome Stars Forks

This repository contains a list of papers on the Protein Representation Learning (PRL), we categorize them based on their published years. We will try to make this list updated. If you found any error or any missed paper, please don't hesitate to open issues or pull requests.

Year 2023

  1. [bioRxiv 2023] Retrieved Sequence Augmentation for Protein Representation Learning [paper][code]
  2. [ICLR 2023] Protein Representation Learning by Geometric Structure Pretraining [paper]
  3. [ICLR 2023] Continuous-Discrete Convolution for Geometry-Sequence Modeling in Proteins[paper]
  4. [ICLR 2023] Protein Representation Learning via Knowledge Enhanced Primary Structure Reasoning[paper]
  5. [ICLR 2023] Multi-level Protein Structure Pre-training via Prompt Learning[paper]
  6. [ICLR 2023] Learning Hierarchical Protein Representations via Complete 3D Graph Networks[paper]

Year 2022

  1. [bioRxiv 2022] Codon language embeddings provide strong signals for protein engineering [paper]
  2. [Arxiv 2022] When Geometric Deep Learning Meets Pretrained Protein Language Models [paper]
  3. [Arxiv 2022] Contrastive Representation Learning for 3D Protein Structures [paper]
  4. [Bioinformatics 2022] Structure-aware Protein Self-supervised Learning [paper][video]
  5. [KDD 2022] GBPNet: Universal Geometric Representation Learning on Protein Structures [paper][code]
  6. [Arxiv 2022] Directed Weight Neural Networks for Protein Structure Representation Learning [paper]
  7. [PLOS Computational Biology 2022] Fast protein structure comparison through effective representation learning with contrastive graph neural networks [paper][code]
  8. [NeurIPS 2022] Exploring evolution-based &-free protein language models as protein function predictors [paper]
  9. [bioRxiv 2022] High-resolution de novo structure prediction from primary sequence [paper][code]
  10. [Bioinformatics 2022] ProteinBERT: A universal deep-learning model of protein sequence and function [paper][code]
  11. [Communications Biology 2022] Artificial Intelligence Guided Conformational Mining of Intrinsically Disordered Proteins [paper][code]
  12. [Cell Systems 2022] Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins [paper][code]
  13. [bioRxiv 2022] Convolutions are competitive with transformers for protein sequence pretraining [paper][code]
  14. [bioRxiv 2022] Masked inverse folding with sequence transfer for protein representation learning [paper][code]
  15. [Nature methods 2022] Self-supervised deep learning encodes high-resolution features of protein subcellular localization [paper][code]
  16. [bioRxiv 2022] Language models of protein sequences at the scale of evolution enable accurate structure prediction [paper]
  17. [bioRxiv 2022] Atomic protein structure refinement using all-atom graph representations and SE(3)-equivariant graph neural networks [paper]
  18. [Bioinformatics 2022] Cross-Modality and Self-Supervised Protein Embedding for Compound–Protein Affinity and Contact Prediction [paper][code]
  19. [bioRxiv 2022] COLLAPSE: A representation learning framework for identification and characterization of protein structural sites [paper]
  20. [bioRxiv 2022] An Analysis of Protein Language Model Embeddings for Fold Prediction [paper]
  21. [ICLR 2022] OntoProtein: Protein Pretraining With Gene Ontology Embedding [paper] [code]
  22. [Bioinformatics 2022] DistilProtBert: a distilled protein language model used to distinguish between real proteins and their randomly shuffled counterparts [paper]
  23. [Briefings in Bioinformatics 2022] SPRoBERTa: protein embedding learning with local fragment modeling [paper]

Year 2021

  1. [Arxiv 2021] Pre-training co-evolutionary protein representation via a pairwise masked language model [paper]
  2. [NeurIPS 2021] Language models enable zero-shot prediction of the effects of mutations on protein function [paper][code]
  3. [ICML 2021] MSA Transformer [paper][code]
  4. [TPAMI 2021] ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Learning [paper][code]
  5. [Arxiv 2021] Modeling Protein Using Large-scale Pretrain Language Model [paper][code]
  6. [IEEE Access 2021] Pre-Training of Deep Bidirectional Protein Sequence Representations With Structural Information [paper][code]
  7. [Bioinformatics 2021] GraphQA: protein model quality assessment using graph convolutional networks [paper][code]
  8. [ICLR 2021] Intrinsic-extrinsic convolution and pooling for learning on 3d protein structures [paper][code]
  9. [ICLR 2021] Learning from Protein Structure with Geometric Vector Perceptrons [paper][code]
  10. [NeurIPS 2021] Multi-Scale Representation Learning on Proteins [paper][code]
  11. [PNAS 2021] Neural networks to learn protein sequence–function relationships from deep mutational scanning data [paper][code]
  12. [bioRxiv 2021] LM-GVP: A Generalizable Deep Learning Framework for Protein Property Prediction from Sequence and Structure [paper][code]
  13. [Cell Systems 2021] Learning the protein language: Evolution, structure, and function [paper][code]
  14. [Nature Communications 2021] Structure-based protein function prediction using graph convolutional networks [paper][code]
  15. [KDD 2021] Geometric Graph Representation Learning on Protein Structure Prediction [paper]
  16. [Arxiv 2021] Adversarial Contrastive Pre-training for Protein Sequences [paper]
  17. [Emerg Top Life Sci 2021] Graph representation learning for structural proteomics [paper]
  18. [Arxiv 2021] Graph Representation Learning in Biomedicine [paper]
  19. [Applied Sciences 2021] GraphMS:Drug Target Prediction Using Graph Representation Learning with Substructures [paper][code]
  20. [JMC 2021] InteractionGraphNet: A Novel and Efficient Deep Graph Representation Learning Framework for Accurate Protein−Ligand Interaction Predictions [paper][code]
  21. [bioRxiv 2021] Self-Supervised Representation Learning of Protein Tertiary Structures (PtsRep) and Its Implications for Protein Engineering [paper]
  22. [Algorithms 2021] Capturing Protein Domain Structure and Function Using Self-Supervision on Domain Architectures [paper][code]
  23. [bioRxiv 2021] Combining evolutionary and assay-labelled data for protein fitness prediction [paper]
  24. [Science 2021] Accurate prediction of protein structures and interactions using a three-track neural network [paper][code]
  25. [Nature 2021] Highly accurate protein structure prediction with AlphaFold [paper][code]
  26. [IEEE TCBB 2021] Sequence representations and their utility for predicting protein-protein interactions [paper]
  27. [Bioinformatics 2021] Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function [paper][code]
  28. [CVPR 2021] Fast end-to-end learning on protein surfaces [paper]
  29. [Briefings in Functional Genomics 2021] Pretraining model for biological sequence data [paper]
  30. [bioRxiv 2021] Toward More General Embeddings for Protein Design: Harnessing Joint Representations of Sequence and Structure [paper]
  31. [NeurIPS 2021] Neural Distance Embeddings for Biological Sequences [paper][code]
  32. [Computational Biology and Chemistry 2021] Convolutional neural networks with image representation of amino acid sequences for protein function prediction [paper][code]
  33. [bioRxiv 2021] Distillation of MSA Embeddings to Folded Protein Structures with Graph Transformers [paper]
  34. [chemRxiv 2021] Identification of Enzymatic Active Sites with Unsupervised Language Modeling [paper]
  35. [bioRxiv 2021] Deciphering the language of antibodies using self-supervised learning [paper]
  36. [bioRxiv 2021] Hydrogen bonds meet self-attention: all you need for general-purpose protein structure embedding [paper]
  37. [bioRxiv 2021] Improving Generalizability of Protein Sequence Models with Data Augmentations [paper]

Year 2020

  1. [BCB 2020] Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks [paper][code]
  2. [Bioinformatics 2020] UDSMProt: universal deep sequence models for protein classification [paper][code]
  3. [bioRxiv 2020] Self-Supervised Contrastive Learning of Protein Representations By Mutual Information Maximization [paper]
  4. [bioRxiv 2020] End-to-end multitask learning, from protein language to protein features without alignments [paper]
  5. [bioRxiv 2020] Language modelling for biological sequences – curated datasets and baselines [paper][code]
  6. [NeurIPS 2020] Is Transfer Learning Necessary for Protein Landscape Prediction? [paper]
  7. [PNAS 2020] Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [paper][code]
  8. [Arxiv 2020] Profile Prediction: An Alignment-Based Pre-Training Task for Protein Sequence Models [paper]
  9. [Arxiv 2020] ProGen: Language Modeling for Protein Generation [paper][code]
  10. [bioRxiv 2020] Evaluation of Methods for Protein Representation Learning: A Quantitative Analysis [paper]
  11. [NAR Genomics and Bioinformatics 2020] Mutation effect estimation on protein–protein interactions using deep contextualized representation learning [paper][code]
  12. [CSBJ 2020] Representation learning applications in biological sequence analysis [paper]
  13. [bioRxiv 2020] TripletProt: Deep Representation Learning of Proteins based on Siamese Networks [paper]
  14. [RCMB 2020] Evolutionary context-integrated deep sequence modeling for protein engineering [paper]
  15. [Arxiv 2020] What is a meaningful representation of protein sequences? [paper][code]
  16. [bioRxiv 2020] Transformer protein language models are unsupervised structure learners [paper][code]
  17. [bioRxiv 2020] Deep learning-based prediction of protein structure using learned representations of multiple sequence alignments [paper]

Year before-2019

  1. [Cell 2019] A High Efficient Biological Language Model for Predicting Protein–Protein Interactions [paper][code]
  2. [Nature Method 2019] Unified rational protein engineering with sequence-only deep representation learning [paper][code]
  3. [NeurIPS 2019] Evaluating Protein Transfer Learning with TAPE [paper][code]
  4. [Nature communications 2019] Deciphering protein evolution and fitness landscapes with latent space models [paper][code]
  5. [bioRxiv 2019] DeepPrime2Sec: Deep Learning for Protein Secondary Structure Prediction from the Primary Sequences [paper][code]
  6. [ACS Nano 2019] A Self-Consistent Sonification Method to Translate Amino Acid Sequences into Musical Compositions and Application in Protein Design Using Artificial Intelligence [paper]
  7. [bioRxiv 2019] Augmenting protein network embeddings with sequence information [paper]
  8. [Nature Machine Intelligence 2019] Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins. [paper][code]
  9. [bioRxiv 2019] Modeling the Language of Life – Deep Learning Protein Sequences [paper][code]
  10. [ICLR 2019] Learning protein sequence embeddings using information from structure [paper][code]
  11. [BIBM 2019] GraphCPI: Graph Neural Representation Learning for Compound-Protein Interaction [paper]
  12. [Bioinformatics 2018] Learned protein embeddings for machine learning [paper][code]
  13. [Bioinformatics 2018] Deep convolutional networks for quality assessment of protein folds [paper][code]
  14. [bioRxiv 2018] Deep Semantic Protein Representation for Annotation, Discovery, and Engineering [paper][code]
  15. [bioRxiv 2017] Predicting Protein Binding Affinity With Word Embeddings and Recurrent Neural Networks [paper][code]
  16. [Arxiv 2017] Variational auto-encoding of protein sequences [paper][code]
  17. [Arxiv 2016] Distributed Representations for Biological Sequence Analysis [paper]
  18. [Bioinformatics 2015] ProFET: Feature engineering captures high-level protein functions [paper]

Related Awesome

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@article{wu2022survey,
  title={A Survey on Protein Representation Learning: Retrospect and Prospect},
  author={Wu, Lirong and Huang, Yufei and Lin, Haitao and Li, Stan Z},
  journal={arXiv preprint arXiv:2301.00813},
  year={2022}
}

Contact

If you have any issue about this work, please feel free to contact me by email: