👏 A Survey of Artificial Intelligence in Drug Discovery

💡 Artificial intelligence has been widely applied in drug discovery over the past decade and is still gaining popularity. This repository compiles a collection works on related areas, based on the manuscript Artificial Intelligence in Drug Discovery: Applications and Techniques by Jianyuan Deng et al. The preprint version is available in ResearchGate. Hope you will find it useful for your research (citation is provided below).

🔔 This repository is updated regularly.

@article{deng2022artificial,
  title={Artificial intelligence in drug discovery: applications and techniques},
  author={Deng, Jianyuan and Yang, Zhibo and Ojima, Iwao and Samaras, Dimitris and Wang, Fusheng},
  journal={Briefings in Bioinformatics},
  volume={23},
  number={1},
  pages={bbab430},
  year={2022},
  publisher={Oxford University Press}
}

Contents


1.1 General Drug Discovery
  • Integration of virtual and high-throughput screening (Nat Rev Drug Discov 2002) [Paper]
  • Chemical space and biology (Nature 2004) [Paper]
  • Computer-based de novo design of drug-like molecules (Nat Rev Drug Discov 2005) [Paper]
  • On Outliers and Activity Cliffs-Why QSAR Often Disappoints (J Chem Inf Model 2006) [Paper]
  • Evaluating Virtual Screening Methods:  Good and Bad Metrics for the “Early Recognition” Problem (J Chem Inf Model 2007) [Paper]
  • Virtual screening: an endless staircase? (Nat Rev Drug Discov 2010) [Paper]
  • Privileged Scaffolds for Library Design and Drug Discovery (Curr Opin Chem Biol 2010) [Paper]
  • Principles of early drug discovery (Br J Pharmacol 2011) [Paper]
  • Recognizing Pitfalls in Virtual Screening: A Critical Review (J Chem Inf Model 2012) [Paper]
  • Multi-objective optimization methods in drug design (Drug Discov Today 2013) [Paper]
  • Finding the rules for successful drug optimisation (Drug Discov Today 2014) [Paper]
  • Recent Progress in Understanding Activity Cliffs and Their Utility in Medicinal Chemistry (J Med Chem 2014) [Paper]
  • Automating Drug Discovery (Nat Rev Drug Discov 2017) [Paper]
  • Interpretation of Quantitative Structure−Activity Relationship Models: Past, Present, and Future (J Chem Inf Model 2017) [Paper]
  • Advances and Challenges in Computational Target Prediction (J Chem Inf Model 2019) [Paper]
  • Duality of activity cliffs in drug discovery (Expert Opin Drug Discov 2019) [Paper]
  • QSAR without borders (Chem Soc Rev 2020) [Paper]
  • Designing small molecules for therapeutic success: A contemporary perspective (Drug Discov Today 2021) [Paper]
  • Phenotypic drug discovery: recent successes, lessons learned and new directions (Nat Rev Drug Discov 2022) [Paper]
  • Is the reductionist paradox an Achilles Heel of drug discovery? (J Comput Aided Mol 2022) [Paper]
1.2 Drug Discovery in the AI Era
  • Machine-learning approaches in drug discovery: methods and applications (Drug Discov Today 2015) [Paper]
  • The rise of deep learning in drug discovery (Drug Discov Today 2018) [Paper]
  • Applications of machine learning in drug discovery and development (Nat Rev Drug Discov 2019)[Paper]
  • Deep Learning in Chemistry (J Chem Inf Model 2019) [Paper]
  • Deep learning for molecular design—a review of the state of the art (Mol Syst Des Eng 2019) [Paper]
  • Efficient molecular encoders for virtual screening (Drug Discov Today Technol 2019) [Paper]
  • Artificial intelligence in chemistry and drug design (J Comput Aid Mol Des 2020) [Paper]
  • Graph convolutional networks for computational drug development and discovery (Brief Bioinformatics 2020) [Paper]
  • Transfer Learning for Drug Discovery (J Med Chem 2020) [Paper]
  • Learning Molecular Representations for Medicinal Chemistry (J Med Chem 2020) [Paper]
  • Exploring chemical space using natural language processing methodologies for drug discovery (Drug Discov Today 2020) [Paper]
  • Practical Notes on Building Molecular Graph Generative Models (Applied AI Letters 2020) [Paper]
  • A compact review of molecular property prediction with graph neural networks (Drug Discov Today 2020) [Paper]
  • Artificial intelligence in drug discovery: Recent advances and future perspectives (Expert Opin Drug Discov 2021) [Paper]
  • Artificial intelligence in drug discovery and development (Drug Discov Today 2021) [Paper]
  • Graph neural networks for automated de novo drug design (Drug Discov Today 2021) [Paper]
  • De novo molecular design and generative models (Drug Discov Today 2021) [Paper]
  • Artificial Intelligence for Drug Discovery (KDD 2021) [Paper] [Website] [TorchDrug]
  • Generative Deep Learning for Targeted Compound Design (J Chem Inf Model 2021) [Paper]
  • Explainable Machine Learning for Property Predictions in Compound Optimization (J Med Chem 2021) [Paper]
  • A decade of machine learning-based predictive models for human pharmacokinetics: Advances and challenges (Drug Discov Today 2021) [Paper]
  • Defining Levels of Automated Chemical Design (J Med Chem 2022) [Paper]
  • Evaluation guidelines for machine learning tools in the chemical sciences (Nat Rev Chem 2022) [Paper]
  • Combining DELs and machine learning for toxicology prediction (Drug Discov Today 2022) [Paper]

Side Notes: Successful Applications

  • Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Insilico Medicine)
  • A Deep Learning Approach to Antibiotic Discovery (Cell 2020) [Paper] [Code] (MIT CSAIL)
  • "BenevolentAI Announces First Patient Dosed In Its Atopic Dermatitis Clinical Trial" [Link] (BenevolentAI)
  • "Exscientia Announces First AI-Designed Immuno-Oncology Drug to Enter Clinical Trials" [Link] (Exscientia)
  • "Breaking Big Pharma's AI barrier: Insilico Medicine uncovers novel target, new drug for pulmonary fibrosis in 18 months" [Link] (Insilico Medicine)
1.3 AI-Driven Drug Discovery: Hope or Hype
  • Rethinking drug design in the artificial intelligence era (Nat Rev Drug Discov 2020) [Paper]
  • Towards reproducible computational drug discovery (J Cheminf 2020) [Paper]
  • Current Trends, Overlooked Issues, and Unmet Challenges in Virtual Screening (J Chem Inf Model 2020) [Paper]
  • Drug discovery with explainable artificial intelligence (Nat Mach Intell 2020) [Paper]
  • Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet (Drug Discov Today 2021) [Paper]
  • Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data used for AI in drug discovery (Drug Discov Today 2021) [Paper]
  • Critical assessment of AI in drug discovery (Expert Opin Drug Discov 2021) [Paper]
  • An Insight into Artificial Intelligence in Drug Discovery: An Interview with Professor Gisbert Schneider (Expert Opin Drug Discov 2021) [Paper]

2.1 Large-Scale Databases

PubChem ChEMBL ZINC
  • ZINC 15 – Ligand Discovery for Everyone (J Chem Inf Model 2015) [Paper] [Website]
Others
  • Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases (Brief Bioinformatics 2019) [Paper]
  • DrugBank --- DrugBank 5.0: a major update to the DrugBank database for 2018 (Nucleic Acids Res 2018) [Paper] [Website] [Download]
  • KEGG --- KEGG as a reference resource for gene and protein annotation (Nucleic Acids Res 2016) [Paper] [Website] [Download]
  • PDBbind --- PDB-wide collection of binding data: current status of the PDBbind database (Bioinformatics 2015) [Paper] [Website] [Download]
  • BindingDB --- BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology (Nucleic Acids Res 2016) [Paper] [Website] [Download]
  • DUD --- Benchmarking Sets for Molecular Docking (J Med Chem 2006) [Paper] [Website]
  • DUD-E --- Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking (J Med Chem 2012) [Paper] [Website]
  • MUV --- Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data (J Chem Inf Model 2009) [Paper] [Website]
  • STITCH --- STITCH: interaction networks of chemicals and proteins (Nucleic Acids Res 2008) [Paper] [Website]
  • GLL&GDD --- Ligand and Decoy Sets for Docking to G Protein-Coupled Receptors (J Chem Inf Model 2012) [Paper] [Website]
  • NRLiSt BDB --- NRLiSt BDB, the Manually Curated Nuclear Receptors Ligands and Structures Benchmarking Database (J Med Chem 2014) [Paper] [Website]
  • SIDER --- The SIDER database of drugs and side effects (Nucleic Acids Res 2016) [Paper] [Website]
  • Offsides&Twosides --- Data-driven prediction of drug effects and interactions (Sci Transl Med 2012) [Paper] [Website]
  • DILIrank --- DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans (Drug Discov Today 2016) [Paper] [Website]
  • UniProt --- UniProt: the universal protein knowledgebase in 2021 (Nucleic Acids Res 2021) [Paper] [Website]
  • PDB --- The Protein Data Bank (Nucleic Acids Res 2000) [Paper] [Website]

2.2 Small Molecule Representations

  • Molecular representations in AI‑driven drug discovery: a review and practical guide (J Cheminf 2020) [Paper]

2.3 Benchmark Platforms

MoleculeNet MolMapNet
  • Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations (Nat Mach Intell 2021) [Paper] [Code]
ChemProp
  • Analyzing Learned Molecular Representations for Property Prediction (J Chem Inf Model 2019) [Paper] [Code] [Website]
REINVENT
  • Molecular De Novo design using Recurrent Neural Networks and Reinforcement Learning (J Cheminf 2017) [Paper] [Code]
  • REINVENT 2.0 – an AI Tool for De Novo Drug Design (J Chem Inf Model 2020) [Paper] [Code]
GraphINVENT
  • Graph Networks for Molecular Design (aka: GraphINVENT; Mach Learn: Sci Technol 2021) [Paper] [Code]
  • Practical Notes on Building Molecular Graph Generative Models (Applied AI Letters 2020) [Paper] [Code]
Guacamol
  • GuacaMol: Benchmarking Models for de Novo Molecular Design (J Chem Inf Model 2019) [Paper] [Code]
MOSES
  • Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models (Front Pharmacol 2020) [Paper] [Code]
ATOM3D

3.1 Convolutional Neural Networks

Task*: Molecular Property Prediction; Representation*: Images

  • Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction (J Chem Inf Model 2017) [Paper] (Techs - CNN + SVM)
  • Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models (aka: Chemception; arXiv 2017) [Paper]
  • Toxic Colors: The Use of Deep Learning for Predicting Toxicity of Compounds Merely from Their Graphic Images (aka: Toxic Colors; J Chem Inf Model 2018) [Paper]
  • KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images (J Cheminf 2019) [Paper] [Code] (Techs - CNN)
  • Learning Drug Functions from Chemical Structures with Convolutional Neural Networks and Random Forests (J Chem Inf Model 2019) [Paper] (Techs - CNN)
  • DEEPScreen: high performance drug–target interaction prediction with convolutional neural networks using 2-D structural compound representation (Chem Sci 2020) [Paper] [Code]

Task*: Molecular Property Prediction; Representation*: Fingerprints

  • Massively Multitask Networks for Drug Discovery (arXiv 2015) [Paper]
  • Convolutional Networks on Graphs for Learning Molecular Fingerprints (NeurIPS 2015) [Paper] [Code]

Side Note: Molecular Structure Extraction and Recognition

  • Molecular Structure Extraction from Documents Using Deep Learning (J Chem Inf Model 2019) [Paper]
  • DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature (J Cheminf 2021) [Paper]
  • DECIMER: towards deep learning for chemical image recognition (J Cheminf 2020) [Paper] [Code]
  • DECIMER 1.0: Deep Learning for Chemical Image Recognition using Transformers (chemRxiv 2021) [Paper]
  • Img2Mol - Accurate SMILES Recognition from Molecular Graphical Depictions (Chem Sci 2021) [Paper] [Code]

3.2 Recurrent Neural Networks

Task*: Molecular Property Prediction; Representation*: SMILES Strings

  • SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties (aka: SMILES2Vec; arXiv 2017) [Paper]
  • Large-scale comparison of machine learning methods for drug target prediction on ChEMBL (aka:SmilesLSTM; Chem Sci 2018) [Paper] [Code] (Techs - RNN + GNN + Multi-Task Learning)

Task*: Molecule Generation; Representation*: SMILES Strings

  • Molecular de‑novo design through deep reinforcement learning (aka: REINVENT; J Cheminf 2017) [Paper] (Techs - RNN: GRU + RL: Policy-gradient REINFORCE)
  • Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks (aka: CharRNN; ACS Cent Sci 2018) [Paper] (Techs - Transfer Learning)
  • Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design (ICLR 2018 Workshop) [Paper] (Techs - RL: Hybrid A2C, Policy-gradient PPO)
  • Deep Reinforcement Learning for de novo Drug Design(aka: ReLeaSE; Sci Adv 2018)[Paper] [Code] (Techs - RL: Policy-gradient REINFORCE)
  • Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design (J Chem Inf Model 2019) [Paper] [Code] (Techs - RNN: BiLSTM + RL: Hybrid Actor-Critic)
  • Scaffold-Constrained Molecular Generation (J Chem Inf Model 2020) [Paper] (Techs - RL: Policy-based Hill Climbing)

Task*: Molecule Generation; Representation*: Molecular Graphs

  • GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models (aka: GraphRNN; ICML 2018) [Paper] [Code]
  • Learning Deep Generative Models of Graphs (ICML 2018) [Paper] [Code] (Techs - RNN: LSTM)
  • MolecularRNN: Generating realistic molecular graphs with optimized properties (arXiv 2019) [Paper]
  • A Deep-Learning View of Chemical Space Designed to Facilitate Drug Discovery (aka: DESMILES; J Chem Inf Model 2020) [Paper]

3.3 Graph Neural Networks

Task*: Molecular Property Prediction; Representation*: Molecular Graphs

  • Molecular Graph Convolutions: Moving Beyond Fingerprints (aka: Weave; J Comput Aided Mol Des 2016) [Paper]
  • Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction (J Chem Inf Model 2017) [Paper]
  • Semi-supervised classification with graph convolutional networks (aka: GraphConv; ICLR 2017) [Paper] [Code]
  • Neural Message Passing for Quantum Chemistry (aka: MPNN; ICML 2017) [Paper] [Code]
  • SchNet: A continuous-filter convolutional neural network for modeling quantum interactions (aka: SchNet; NeurIPS 2017)[Paper] [Code]
  • Low Data Drug Discovery with One-Shot Learning (ACS Cent Sci 2017) [Paper] (Techs - LSTM: BiLSTM, attLSTM + GNN + Few-Shot Learning)
  • Large-scale comparison of machine learning methods for drug target prediction on ChEMBL (aka:SmilesLSTM; Chem Sci 2018) [Paper] [Code] (Techs - RNN + GNN + Multi-Task Learning)
  • PotentialNet for Molecular Property Prediction (aka: PotentialNet; ACS Cent Sci 2018) [Paper] (Techs - GNN: GCNN + Multi-Task Learning)
  • Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective (aka: MGCN; AAAI 2019) [Paper]
  • Deep Learning-Based Prediction of Drug-Induced Cardiotoxicity (J Chem Inf Model 2019) [Paper] [Code] (Techs - GCN + Multi-task Learning)
  • DeepChemStable: Chemical Stability Prediction with an Attention-Based Graph Convolution Network (J Chem Inf Model 2019) [Paper] (Techs - GCN + Attention)
  • Analyzing Learned Molecular Representations for Property Prediction (aka: Chemrop, D-MPNN; J Chem Inf Model 2019) [Paper] [Code]
  • Molecule Property Prediction Based on Spatial Graph Embedding (aka: C-SGEN; J Chem Inf Model 2019) [Paper] [Code]
  • Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism (aka: Attentive FP; J Med Chem 2019) [Paper] [Code]
  • Graph convolutional neural networks as” general-purpose” property predictors: the universality and limits of applicability (J Chem Inf Model 2020) [Paper]
  • N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules (aka: N-Gram Graph; NeurIPS 2019) [Paper]
  • Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction (J Cheminf 2020) [Paper] [Code] (Techs - MPNN + Multi-Task Learning)
  • A self‑attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility (J Cheminf 2020) [Paper] [Code] (Techs - MPNN + Self-Attention: Interpretability)
  • Chemically Interpretable Graph Interaction Network for Prediction of Pharmacokinetic Properties of Drug-Like Molecules (aka: CIGIN; AAAI 2020) [Paper] [Code]
  • Strategies for Pre-training Graph Neural Networks (ICLR 2020) [Paper] [Code] (Techs - Self-Supervised Learning)
  • Directional Message Passing for Molecular Graphs (aka: DimeNet; ICLR 2020) [Paper] [Code]
  • Drug–target affinity prediction using graph neural network and contact maps (RSC Advances 2020) [Paper] (Techs - GCN + GAT)
  • ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction (aka: ASGN; KDD 2020) [Paper] [Code] (Techs - Active Learning)
  • Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction (ICML 2020 Workshop) [Paper] [Code] (Techs - GGNN + Meta Learning: MAML, FO-MAML, ANIL)

Task*: Molecule Generation; Representation*: Molecular Graphs

  • Multi‑objective de novo drug design with conditional graph generative model (J Cheminf 2018) [Paper] [Code] (Techs - Conditional Graph Generative Model: MolMP, MolRNN)
  • Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (aka: GCPN; NeurIPS 2018) [Paper] [Code] (Techs - GCN + RL: PPO)
  • Optimization of Molecules via Deep Reinforcement Learning (aka: MolDQN; Sci Rep 2019) [Paper] (Techs - RL: Q-learning)
  • Improving Molecular Design by Stochastic Iterative Target Augmentation (ICML 2020) [Paper] [Code] (Techs - VSeq2Seq/HierGNN + Semi-Supervised Learnning)
  • DeepGraphMolGen, a multi‑objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach (J Cheminf 2020) [Paper] (Techs - GCN + RL: PPO)
  • Reinforced Molecular Optimization with Neighborhood-Controlled Grammars (aka: MNCE-RL; NeurIPS 2020) [Paper] [Code] (Techs - RL: PPO)
  • Graph Networks for Molecular Design (aka: GraphINVENT; Mach Learn: Sci Technol 2021) [Paper] [Code]
  • De novo drug design using reinforcement learning with graph-based deep generative models (aka: RL-GraphINVENT; ChemRxiv 2021) [Paper] [Code]

Side Note: Common GNN Models

  • Recurrent GNNs Gated graph sequence neural networks (aka: GGNN; ICLR 2016) [Paper] [Code]
  • Convolutional GNNs (Spectral-based) Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (aka: ChebNet; NeurIPS 2016) [Paper] [Code]
  • Convolutional GNNs (Spectral-based) Semi-supervised classification with graph convolutional networks (aka: GraphConv; ICLR 2017) [Paper] [Code]
  • Convolutional GNNs (Spatial-based) Neural message passing for quantum chemistry (aka: MPNN; ICML 2017) [Paper] [Code]
  • Convolutional GNNs (Spatial-based) Inductive Representation Learning on Large Graphs (aka: GraphSAGE; NeurIPS 2017) [Paper] [Code]
  • Convolutional GNNs (Spatial-based) Graph Attention Networks (aka: GAT; ICLR 2018) [Paper] [Code]
  • Convolutional GNNs (Spatial-based) How powerful are graph neural networks? (aka: GIN; ICLR 2019) [Paper] [Code]

3.4 Variational Autoencoders

Task*: Molecule Generation; Representation*: SMILES Strings

  • Automatic chemical design using a data-driven continuous representation of molecules (arXiv 2016; ACS Cent Sci 2018) [Paper] [Code] (Techs - VAE)
  • Grammar Variational Autoencoder (aka: GrammarVAE; ICML 2017) [Paper]
  • Application of Generative Autoencoder in De Novo Molecular Design (Mol Inform 2017) [Paper]
  • Syntax-Directed Variational Autoencoder for Structured Data (aka: SD-VAE; ICLR 2018) [Paper] [Code]
  • Conditional Molecular Design with Deep Generative Models (aka: Continuous SSVAE; J Chem Inf Model 2018)[Paper] [Code]
  • Molecular generative model based on conditional variational autoencoder for de novo molecular design (aka: CVAE; J Cheminf 2018) [Paper] [Code] (Techs - VAE)
  • Constrained Graph Variational Autoencoders for Molecule Design (aka: CGVAE; NeurIPS 2018) [Paper] [Code]
  • NEVAE: A Deep Generative Model for Molecular Graphs (aka: NeVAE; AAAI 2019) [Paper] [Code]
  • De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping (aka: GTMVAE; J Chem Inf Model 2019) [Paper] (Techs - Autoencoder + RNN)
  • Re-balancing Variational Autoencoder Loss for Molecule Sequence Generation (aka: re-balanced VAE; ACM BCB 2020) [Paper] [Code] (Techs - RNN: BiGRU + VAE)
  • CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models (aka: CogMol; NeurIPS 2020) [Paper] [Code]

VAE Variant: AAE

  • Application of Generative Autoencoder in De Novo Molecular Design (Mol Inform 2017) [Paper]
  • druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico (aka: druGAN; Mol Pharm 2017) [Paper]
  • Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery (aka: SAAE; Mol Pharm 2018) [Paper]

Task*: Molecule Generation; Representation*: Molecular Graphs

  • GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders (aka: GraphVAE; arXiv 2018) [Paper]
  • Junction Tree Variational Autoencoder for Molecular Graph Generation (aka: JT-VAE; ICML 2018) [Paper] [Code]
  • Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders (aka:Regularized VAE; NeurIPS 2018)[Paper]
  • Molecular Hypergraph Grammar with Its Application to Molecular Optimization (aka: MHG-VAE; ICML 2019) [Paper] [Code]
  • Efficient learning of non‑autoregressive graph variational autoencoders for molecular graph generation (J Cheminf 2019) [Paper] [Code] (Techs - Non-autoregressive VAE + RL)
  • Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Techs - VAE + RL: REINFORCE)
  • Scaffold-based molecular design using graph generative model (aka: ScaffoldVAE; arXiv 2019) [Paper]
  • Learning Multimodal Graph-to-Graph Translation for Molecule Optimization (aka: VJTNN; ICLR 2019) [Paper] [Code]
  • CORE: Automatic Molecule Optimization Using Copy & Refine Strategy (AAAI 2020) [Paper] [Code]
  • Hierarchical Generation of Molecular Graphs using Structural Motifs (aka: HierVAE; ICML 2020) [Paper] [Code] (Techs - Hierarchical VAE)
  • Compressed graph representation for scalable molecular graph generation (J Cheminf 2020) [Paper] [Code] (Techs - Non-autoregressive VAE)

Side Note: Reaction & Retrosynthesis Prediction; Representation*: Molecular Graphs

  • Generating Molecules via Chemical Reactions (ICLR 2019 Workshop) [Paper]
  • Barking up the right tree: an approach to search over molecule synthesis DAG (NeurIPS 2020) [Paper] [Code]

3.5 Generative Adversarial Networks

Task*: Molecule Generation; Representation*: SMILES Strings

  • Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models (aka: ORGAN; ArXiv 2017) [Paper] [Code] (Techs - GAN: G-RNN, D-CNN + RL: REINFORCE)
  • Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (aka: ORGANIC; ChemRxiv 2017) [Paper] [Code] (Techs - GAN + RL: REINFORCE)
  • Reinforced Adversarial Neural Computer for de Novo Molecular Design (aka: RANC; J Chem Inf Model 2018) [Paper] (Techs - GAN + RL)

Task*: Molecule Generation; Representation*: Molecular Graphs

3.6 Normalizing Flow Models

Task*: Molecule Generation; Representation*: Molecular Graphs

  • GraphNVP: An Invertible Flow Model for Generating Molecular Graphs (aka: GraphNVP; arXiv 2019) [Paper] [Code]
  • Graph Residual Flow for Molecular Graph Generation (aka: GRF; arXiv 2019) [Paper]
  • GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation (aka: GraphAF; ICLR 2020) [Paper] [Code] (Techs - Flow + RL: PPO)
  • MoFlow: An Invertible Flow Model for Generating Molecular Graphs (aka: MoFlow; KDD 2020) [Paper] [Code]
  • GraphDF: A Discrete Flow Model for Molecular Graph Generation (aka: GraphDF; ICML 2021) [Paper]
  • Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation (NeurIPS 2021) [Paper]

3.7 Transformers

Task*: Molecular Property Prediction; Representation*: SMILES Strings

  • SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction (aka: SMILES-BERT; ACM BCB 2019) [Paper] (Techs - BERT)
  • SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery (aka: SMILES Transformer; arXiv 2019) [Paper] [Code]
  • ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction (aka: ChemBERTa; arXiv 2020) [Paper] [Code]
  • Molecular representation learning with language models and domain-relevant auxiliary tasks (aka: MolBERT; NeurIPS 2020 Workshop) [Paper] [Code] (Techs - BERT + Self-Supervised Learning)
  • Algebraic graph-assisted bidirectional transformers for molecular property prediction (aka: AGBT; Nat Commun 2021) [Paper] [Code]
  • ChemBERTa-2: Towards Chemical Foundation Models (aka: ChemBERTa-2; arXiv 2022) [Paper]

Task*: Molecular Property Prediction; Representation*: Molecular Graphs

  • Self-Supervised Graph Transformer on Large-Scale Molecular Data (aka: GROVER; NeurIPS 2020) [Paper] [Code] (Techs - Graph Transformer + Self-Supervised Learning)

Task*: Molecule Generation; Representation*: SMILES Strings

  • Transformer-Based Generative Model Accelerating the Development of Novel BRAF Inhibitors (ACS Omega 2021) [Paper]
  • MolGPT: Molecular Generation Using a Transformer-Decoder Model (aka: MolGPT; J Chem Inf Model 2022) [Paper] [Code]

Task*: Molecule Generation; Representation*: Molecular Graphs

  • A Model to Search for Synthesizable Molecules (aka: Molecule Chef; NeurIPS 2019) [Paper] [Code]
  • Transformer neural network for protein-specific de novo drug generation as a machine translation problem (Sci Rep 2021) [Paper]

4.1 Self-Supervised Learning in Molecular Property Prediction

Generative Learning

  • Strategies for Pre-training Graph Neural Networks (ICLR 2020) [Paper] [Code] (Techs - Self-Supervised Learning)
  • Molecular representation learning with language models and domain-relevant auxiliary tasks (aka: MolBERT; NeurIPS 2020 Workshop) [Paper] [Code] (Techs - BERT + Self-Supervised Learning)
  • Self-Supervised Graph Transformer on Large-Scale Molecular Data (aka: GROVER; NeurIPS 2020) [Paper] [Code] (Techs - Graph Transformer + Self-Supervised Learning)

Contrastive Learning

  • MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (ArXiv 2021) [Paper] [Code]
  • Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast (J Chem Inf Model 2022) [Paper] [Code]

4.2 Reinforcement Learning in Molecule Generation

  • Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models (aka: ORGAN; ArXiv 2017) [Paper] [Code] (Techs - GAN: G-RNN, D-CNN + RL: Policy-gradient REINFORCE)
  • Molecular de‑novo design through deep reinforcement learning (aka: REINVENT; J Cheminf 2017) [Paper] (Techs - RNN: GRU + RL: Policy-gradient REINFORCE)
  • Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (aka: ORGANIC; ChemRxiv 2017) [Paper] [Code] (Techs - GAN + RL: Policy-gradient REINFORCE)
  • Reinforced Adversarial Neural Computer for de Novo Molecular Design (aka: RANC; J Chem Inf Model 2018) [Paper] (Techs - GAN + RL)
  • Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design (ICLR 2018 Workshop) [Paper] (Techs - RL: Hybrid A2C, Policy-gradient PPO)
  • MolGAN: An implicit generative model for small molecular graphs (aka: MolGAN; ICML 2018 Workshop) [Paper] [Code-Tensorflow] [Code-PyTorch] (Techs - GAN + RL: Hybrid Actor-Critic DDPG)
  • Deep Reinforcement Learning for de novo Drug Design(aka: ReLeaSE; Sci Adv 2018)[Paper] [Code] (Techs - RL: Policy-gradient REINFORCE)
  • Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (aka: GCPN; NeurIPS 2018)[Paper] [Code] (Techs - GCN + RL: Policy-gradient PPO)
  • Deep learning enables rapid identification of potent DDR1 kinase inhibitors (aka: GENTRL; Nat Biotechnol 2019) [Paper] [Code] (Techs - VAE + RL: Policy-gradient REINFORCE)
  • Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design (aka: DeepFMPO; J Chem Inf Model 2019) [Paper] [Code] (Techs - RNN: BiLSTM + RL: Hybrid Actor-Critic)
  • Optimization of Molecules via Deep Reinforcement Learning (aka: MolDQN; Sci Rep 2019) [Paper] (Techs - RL: Value-based Double Q-learning)
  • Efficient learning of non‑autoregressive graph variational autoencoders for molecular graph generation (J Cheminf 2019) [Paper] [Code] (Techs - Non-autoregressive VAE + RL: Policy-gradient)
  • GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation (aka: GraphAF; ICLR 2020) [Paper] [Code] (Techs - Flow + RL: Policy-gradient PPO)
  • Reinforcement Learning for Molecular Design Guided by Quantum Mechanics (cka: MolGym; ICML 2020) [Paper] (Techs - RL: Policy-gradient PPO)
  • DeepGraphMolGen, a multi‑objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach (aka: DeepGraphMolGen; J Cheminf 2020) [Paper] (Techs - GCN + RL: Policy-gradient PPO)
  • Reinforced Molecular Optimization with Neighborhood-Controlled Grammars (aka: MNCE-RL; NeurIPS 2020) [Paper] [Code] (Techs - RL: Policy-gradient PPO)
  • Deep inverse reinforcement learning for structural evolution of small molecules (Brief Bioinform 2021) [Paper] [Code] (Techs - Inverse RL)

Side Note: Common RL Algorithms

  • Value-based Playing Atari with Deep Reinforcement Learning (aka: DQN; NeurIPS Workshop 2013) [Paper]
  • Value-based Human-level control through deep reinforcement learning (aka: DQN; Nature 2015) [Paper]
  • Value-based Deep Reinforcement Learning with Double Q-learning (aka: Double Q-learning; AAAI 2016) [Paper]
  • Value-based Prioritized Experience Replay (aka: DQN with Experience Replay; ICLR 2016) [Paper]
  • Value-based Dueling Network Architectures for Deep Reinforcement Learning (aka: Dueling Network; ICML 2016) [Paper]
  • Policy-gradient Simple statistical gradient-following algorithms for connectionist reinforcement learning (aka: REINFORCE; Mach Learn 1992) [Paper]
  • Policy-gradient Policy Gradient Methods for Reinforcement Learning with Function Approximation (aka: Random Policy Gradient; NeurIPS 1999) [Paper]
  • Policy-gradient Deterministic Policy Gradient Algorithms (aka: DPG; ICML 2014) [Paper]
  • Policy-gradient Trust Region Policy Optimization (aka: TRPO; ICML 2015) [Paper]
  • Policy-gradient Proximal Policy Optimization Algorithms (aka: PPO; arXiv 2017 2015) [Paper]
  • Hybrid Continuous control with deep reinforcement learning (aka: DDPG; ICLR 2016) [Paper]
  • Hybrid Asynchronous Methods for Deep Reinforcement Learning (aka: A3C; ICML 2016) [Paper]

Side Note: Pareto Optimality

  • De Novo Drug Design of Targeted Chemical Libraries Based on Artificial Intelligence and Pair-Based Multiobjective Optimization (J Chem Inf Model 2020) [Paper] [Code] (Techs - Pareto Optimality)
  • Multiobjective de novo drug design with recurrent neural networks and nondominated sorting (J Cheminf 2020) [Paper] (Techs - Pareto Optimality)
  • DrugEx v2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology (ChemRxiv) [Paper] (Techs - Pareto Optimality)

Side Note: Reaction & Retrosynthesis Optimization

  • Optimizing chemical reactions with deep reinforcement learning (ACS Cent Sci 2017) [Paper]

4.4 Other Learning Paradigms

Metric Learning

  • Machine-guided representation for accurate graph-based molecular machine learning (Phys Chem Chem Phys 2020) [Paper]
  • Embedding of Molecular Structure Using Molecular Hypergraph Variational Autoencoder with Metric Learning (Mol Inform 2020) [Paper]

Few-Shot Learning

  • Low Data Drug Discovery with One-Shot Learning (ACS Cent Sci 2017) [Paper] (Techs - LSTM: BiLSTM, attLSTM + GNN + Few-Shot Learning)
  • Few-Shot Graph Learning for Molecular Property Prediction (WWW 2021) [Paper] [Code]

Meta Learning

  • Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction (ICML 2020 Workshop) [Paper] [Code] (Techs - Gated GNN + Meta Learning: MAML, FO-MAML, ANIL)

Active Learning

  • ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction (aka: ASGN; KDD 2020) [Paper] [Code] (Techs - Active Learning)
  • Evidential Deep Learning for Guided Molecular Property Prediction and Discovery (NeurIPS 2020 Workshop) [Talk]
  • Batched Bayesian Optimization for Drug Design in Noisy Environments (J Chem Inf Model 2022) [Paper] [Code]

Model Interpretation

  • Drug Discovery Maps, a Machine Learning Model That Visualizes and Predicts Kinome−Inhibitor Interaction Landscapes (J Chem Inf Model 2018) [Paper]
  • Using attribution to decode binding mechanism in neural network models for chemistry (PNAS 2019) [Paper]
  • Interpretation of QSAR Models by Coloring Atoms According to Changes in Predicted Activity: How Robust Is It? (J Chem Inf Model 2019) [Paper]
  • Building of Robust and Interpretable QSAR Classification Models by Means of the Rivality Index (J Chem Inf Model 2019) [Paper]

Dataset Concerns

  • In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening (J Chem Inf Model 2019) [Paper]
  • Deep Learning-Based Imbalanced Data Classification for Drug Discovery (J Chem Inf Model 2020) [Paper] [Code]

Uncertainty Estimation

  • General Approach to Estimate Error Bars for Quantitative Structure−Activity Relationship Predictions of Molecular Activity (J Chem Inf Model 2018) [Paper]
  • Assessment and Reproducibility of Quantitative Structure−Activity Relationship Models by the Nonexpert (J Chem Inf Model 2018) [Paper]
  • Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks (J Chem Inf Model 2018) [Paper]
  • Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout (J Chem Inf Model 2019) [Paper]
  • Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks (Nat Mach Intell 2020) [Paper]
  • Assigning Confidence to Molecular Property Prediction (arXiv 2021) [Paper]
  • Gi and Pal Scores: Deep Neural Network Generalization Statistics (ICLR 2021 Workshop) [Paper]
  • Probabilistic Random Forest improves bioactivity predictions close to the classification threshold by taking into account experimental uncertainty (J Cheminf 2021) [Paper]

Representation Capacity

  • Ligand-Based Virtual Screening Using Graph Edit Distance as Molecular Similarity Measure (J Chem Inf Model 2019) [Paper]
  • Optimal Transport Graph Neural Networks (arXiv 2020) [Paper]

Out-of-Distribution Generalization

  • Dissecting Machine-Learning Prediction of Molecular Activity: Is an Applicability Domain Needed for Quantitative Structure−Activity Relationship Models Based on Deep Neural Networks? (J Chem Inf Model 2018) [Paper]
  • Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization (J Chem Inf Model 2018) [Paper]
  • Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds (J Chem Inf Model 2018) [Paper]

Threshold Adjustment

  • GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning (J Chem Inf Model 2021) [Paper] [Code]

Model Comparison

  • Validating the validation: reanalyzing a large‑scale comparison of deep learning and machine learning models for bioactivity prediction (J Comput Aided Mol Des 2020) [Paper]
  • Comparing classification models-a practical tutorial (J Comput Aided Mol Des 2021) [Paper]

Model Adoption

  • A Turing Test for Molecular Generators (J Med Chem 2020) [Paper]

Molecular Docking

  • Docking and scoring in virtual screening for drug discovery: methods and applications (Nat Rev Drug Discov 2004) [Paper]
  • Benchmarking sets for molecular docking (J Med Chem 2006) [Paper]
  • Molecular Docking: A powerful approach for structure-based drug discovery (Curr Comput Aided Drug Des 2011) [Paper]
  • Software for Molecular Docking: a review (Biophys Rev 2017) [Paper]
  • Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery (ACS Cent Sci 2020) [Paper]
  • A Deep-Learning Approach toward Rational Molecular Docking Protocol Selection (Molecules 2020) [Paper]
  • GNINA 1.0: molecular docking with deep learning (J Cheminf 2021) [Paper]