List of papers about Proteins Design using Deep Learning

About this repository

Inspired by Kevin Kaichuang Yang's Machine-learning-for-proteins. In terms of the fast changing of protein design in DL, I started making this dynamic repository as a record of papers in this field for these newcomers like me.
My notes of these papers are shared in a Zhihu Column(simplified Chinese).

Menu

1.Reviews

Deep learning in protein structural modeling and design
Gao, Wenhao, et al.
Patterns 1.9 (2020)

Deep generative modeling for protein design
Strokach, Alexey, and Philip M. Kim.
Current Opinion in Structural Biology 72 (2022)

Protein sequence design with deep generative models
Wu, Zachary, et al.
Current Opinion in Chemical Biology 65 (2021)
Notes of mine

Structure-based protein design with deep learning
Ovchinnikov, Sergey, and Po-Ssu Huang.
Current opinion in chemical biology 65 (2021)
Notes of mine

2.Hallucination

Hallucination is inverting prediction model for design

2.1.trRosetta-based

Design of proteins presenting discontinuous functional sites using deep learning
Tischer, Doug, et al.
bioRxiv (2020)

De novo protein design by deep network hallucination
Anishchenko, I., Pellock, S.J., Chidyausiku, T.M. et al.
Nature (2021)

Protein sequence design by conformational landscape optimization
Norn, Christoffer, et al.
Proceedings of the National Academy of Sciences 118.11 (2021)

Fast differentiable DNA and protein sequence optimization for molecular design
Linder, Johannes, and Georg Seelig.
arXiv preprint arXiv:2005.11275 (2020)

2.2.AlphaFold2-based

End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman
Petti, Samantha, et al.
bioRxiv (2021)
Notes of mine

Deep learning methods for designing proteins scaffolding functional sites
Wang, J., et al.
bioRxiv(2021)

AlphaDesign: A de novo protein design framework based on AlphaFold
Jendrusch, Michael, Jan O. Korbel, and S. Kashif Sadiq.
bioRxiv (2021)

Using AlphaFold for Rapid and Accurate Fixed Backbone Protein Design
Moffat, Lewis, Joe G. Greener, and David T. Jones.
bioRxiv (2021)

3.Function to Scaffold

These models design backbone/scaffold/template.

3.1.GAN-based

Conditioning by adaptive sampling for robust design
Brookes, David, Hahnbeom Park, and Jennifer Listgarten.
International conference on machine learning. PMLR, 2019

RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network
Sabban, Sari, and Mikhail Markovsky.
F1000Research 9 (2020)

Fully differentiable full-atom protein backbone generation
Anand, Namrata, Raphael Eguchi, and Po-Ssu Huang.
OpenReview (2019)

3.2.VAE-based

IG-VAE: generative modeling of immunoglobulin proteins by direct 3D coordinate generation
Eguchi, Raphael R., et al.
Biorxiv (2020)

3.3.DAE-based

Function-guided protein design by deep manifold sampling
Vladimir Gligorijevic, Stephen Ra, Daniel Berenberg, Richard Bonneau, Kyunghyun Cho
NeurIPS2021

3.4.MLP-based

A backbone-centred energy function of neural networks for protein design
Huang, B., Xu, Y., Hu, X. et al
Nature (2022)

4.Scaffold to Sequence

Identify amino sequence from given backbone/scaffold/template.

4.1.MLP-based

3D representations of amino acids—applications to protein sequence comparison and classification
Li, Jie, and Patrice Koehl.
Computational and structural biotechnology journal 11.18 (2014)

SPIN2: Predicting sequence profiles from protein structures using deep neural networks
O'Connell, James, et al.
Proteins: Structure, Function, and Bioinformatics 86.6 (2018)

Computational protein design with deep learning neural networks
Wang, Jingxue, et al.
Scientific reports 8.1 (2018)

4.2.VAE-based

Design of metalloproteins and novel protein folds using variational autoencoders
Greener, Joe G., Lewis Moffat, and David T. Jones.
Scientific reports 8.1 (2018)

4.3.Bi-LSTM+2D-ResNet

To improve protein sequence profile prediction through image captioning on pairwise residue distance map
Chen, Sheng, et al.
Journal of chemical information and modeling 60.1 (2019)

4.4.CNN-based

ProDCoNN: Protein design using a convolutional neural network
Zhang, Yuan, et al.
Proteins: Structure, Function, and Bioinformatics 88.7 (2020)

A structure-based deep learning framework for protein engineering
Shroff, Raghav, et al.
bioRxiv (2019)

Protein sequence design with a learned potential
Anand-Achim, Namrata, et al.
Biorxiv (2021)

Protein sequence design by explicit energy landscape optimization
Norn, Christoffer, et al.
bioRxiv (2020)

4.5.GNN-based

Fast and flexible protein design using deep graph neural networks.
Strokach, Alexey, et al.
Cell Systems 11.4 (2020)

TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs
Li, Alex J., et al.
NeurIPS 2021

4.6.GAN-based

De novo protein design for novel folds using guided conditional Wasserstein generative adversarial networks
Karimi, Mostafa, et al.
Journal of chemical information and modeling 60.12 (2020)

4.7.Transformer-based

Generative models for graph-based protein design
Ingraham, John, et al.
NeurIPS2019

Rotamer-Free Protein Sequence Design Based on Deep Learning and Self-Consistency
Liu, Yufeng, et al.
Nature portfolio(2022)

4.8.ResNet-based

DenseCPD: improving the accuracy of neural-network-based computational protein sequence design with DenseNet
Qi, Yifei, and John ZH Zhang.
Journal of chemical information and modeling 60.3 (2020)

5.Function to Sequence

These models generate sequences from expected function.

5.1.CM-Align

AutoFoldFinder: An Automated Adaptive Optimization Toolkit for De Novo Protein Fold Design
Shuhao Zhang, Youjun Xu, Jianfeng Pei, Luhua Lai
NeurIPS2021

5.2.VAE-based

Variational auto-encoding of protein sequences
Sinai, Sam, et al.
arXiv preprint arXiv:1712.03346 (2017)

Deep generative models for T cell receptor protein sequences
Davidsen, Kristian, et al.
Elife 8 (2019)

How to hallucinate functional proteins
Costello, Zak, and Hector Garcia Martin.
arXiv preprint arXiv:1903.00458 (2019)

Conditioning by adaptive sampling for robust design
Brookes, David, Hahnbeom Park, and Jennifer Listgarten.
International conference on machine learning. PMLR, 2019

Generating functional protein variants with variational autoencoders
Hawkins-Hooker, Alex, et al.
PLoS computational biology 17.2 (2021)

Therapeutic enzyme engineering using a generative neural network
Giessel, Andrew, et al.
Scientific Reports 12.1 (2022)

Function-guided protein design by deep manifold sampling
Gligorijevic, Vladimir, et al.
bioRxiv (2021)

Deep generative models create new and diverse protein structures
Zeming, Tom, Yann and Alexander.
NeurIPS 2021

5.3.GAN-based

Generative modeling for protein structures
Anand, Namrata, and Possu Huang.
NeurIPS 2018

Generating protein sequences from antibiotic resistance genes data using Generative Adversarial Networks
Chhibbar, Prabal, and Arpit Joshi.
arXiv preprint arXiv:1904.13240 (2019)

ProGAN: Protein solubility generative adversarial nets for data augmentation in DNN framework
Han, Xi, et al.
Computers & Chemical Engineering 131 (2019)

Conditional Generative Modeling for De Novo Protein Design with Hierarchical Functions
Kucera, Tim, Matteo Togninalli, and Laetitia Meng-Papaxanthos
bioRxiv (2021)

Expanding functional protein sequence spaces using generative adversarial networks
Repecka, Donatas, et al.
Nature Machine Intelligence 3.4 (2021)

HelixGAN: A bidirectional Generative Adversarial Network with search in latent space for generation under constraints
Xie, Xuezhi, and Philip M. Kim.
NeurIPS 2021

5.4.NLP-based

Recurrent neural network model for constructive peptide design
Müller, Alex T., Jan A. Hiss, and Gisbert Schneider.
Journal of chemical information and modeling 58.2 (2018)

Pepcvae: Semi-supervised targeted design of antimicrobial peptide sequences
Das, Payel, et al.
arXiv preprint arXiv:1810.07743 (2018)

Deep learning to design nuclear-targeting abiotic miniproteins
Schissel, Carly K., et al.
Nature Chemistry 13.10 (2021)

Protein design and variant prediction using autoregressive generative models
Shin, Jung-Eun, et al.
Nature communications 12.1 (2021)

ECNet is an evolutionary context-integrated deep learning framework for protein engineering
Luo, Yunan, et al.
Nature communications 12.1 (2021)

Guided Generative Protein Design using Regularized Transformers
Castro, Egbert, et al.
arXiv preprint arXiv:2201.09948 (2022)

Generative Language Modeling for Antibody Design
Shuai, Richard W., Jeffrey A. Ruffolo, and Jeffrey J. Gray.
bioRxiv (2021)

Deep neural language modeling enables functional protein generation across families
Madani, Ali, et al.
bioRxiv (2021)

BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning
Prihoda, David, et al.
mAbs. Vol. 14. No. 1. Taylor & Francis, 2022

5.5.ResNet-based

Accelerating protein design using autoregressive generative models
Riesselman, Adam, et al.
BioRxiv (2019)

5.6.Bayesian-based

AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation
Khan, Asif, et al.
arXiv preprint(2022)

5.7.Pretrained-based

Progen: Language modeling for protein generation
Madani, Ali, et al.
arXiv preprint arXiv:2004.03497 (2020)

5.8.RL-based

Model-based reinforcement learning for biological sequence design
Angermueller, Christof, et al.
International conference on learning representations. 2019

6.Molecular Design Models

In consideration of learning more various of models for design, these recommended models from Molecular Design are helpful.

CELLS: Cost-Effective Evolution in Latent Space for Goal-Directed Molecular Generation
Chen, Zhiyuan, et al.
arXiv preprint arXiv:2112.00905 (2021)

A 3D Generative Model for Structure-Based Drug Design
Luo, Shitong, et al.
Advances in Neural Information Processing Systems 34 (2021)

to be continued...