/KDD2023_KaGML_DrugDiscovery_Tutorial

Materials for KDD2023 tutorial: Knowledge-augmented Graph Machine Learning for Drug Discovery: from Precision to Interpretability

KDD 2023 KaGML for Drug Discovery Tutorial

PRs Welcome Stars Forks

Materials for KDD2023 tutorial: Knowledge-augmented Graph Machine Learning for Drug Discovery: from Precision to Interpretability

Time and Location

1. Time: TBD.

2. Location: TBD.

3. Zoom: Click "join zoom room" at Underline.

Tutorial abstract

Graph Machine Learning (GML) has gained considerable attention for its exceptional ability to model graph-structured biomedical data and investigate their properties and functional relationships. Despite extensive efforts, GML methods still suffer from several deficiencies, such as the limited ability to handle supervision sparsity and provide interpretability in learning and inference processes and their ineffectiveness in utilising relevant domain knowledge. In response, recent studies have proposed integrating external biomedical knowledge into the GML pipeline to realise more precise and interpretable drug discovery with limited training instances. This tutorial presents a comprehensive overview of long-standing drug discovery principles, provides the foundational concepts and cutting-edge techniques for graph-structured data and knowledge databases, and formally summarises Knowledge-augmented Graph Machine Learning (KaGML) for drug discovery. We have recently completed a survey of KaGML works that organises the outstanding approaches into four categories following a novel-defined taxonomy. This tutorial will present the result of this scholarly work. To encourage audience participation and facilitate research in this promptly emerging field, we also share valuable practical resources for intelligent drug discovery and provide an in-depth discussion of the potential avenues for future advancements.

Outline and Material

  • Introduction and Motivation [Slides]
  • Background of Drug Discovery [Slides]
  • Graph Machine Learning (GML) and Knowledge Graph (KG) in Drug Discovery [Slides]
  • Knowledge-augmented Graph Machine Learning (KaGML) for Drug Discovery [Slides]
    • Taxonomy of KaGML
    • Incorporating knowledge in preprocessing
    • Incorporating knowledge in pretraining
    • Incorporating knowledge in training
    • Incorporating knowledge in interpretability
  • Practical Resources [Slides]
  • Open Challenges and Future Directions [Slides]
  • Q&A Session

Presenters

Zhiqiang Zhong

Davide Mottin

Additional Relevant Materials

  1. Knowledge-augmented Graph Machine Learning for Drug Discovery: A Survey from Precision to Interpretability. arxiv 2023. [Paper]

    Zhiqiang Zhong, Anastasia Barkova, Davide Mottin.

  1. Neural Message Passing for Quantum Chemistry. ICML 2017. [Paper]

    Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl.

  2. Analyzing Learned Molecular Representations for Property Prediction. JCIM 2019. [Paper]

    Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, Andrew Palmer, Volker Settels, Tommi Jaakkola, Klavs Jensen, Regina Barzilay.

  3. Communicative representation learning on attributed molecular graphs. IJCAI 2020. [Paper]

    Ying Song, Shuangjia Zheng, Zhangming Niu, Zhang-Hua Fu, Yutong Lu, Yuedong Yang.

  4. KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction. IJCAI 2020. [Paper]

    Xuan Lin, Zhe Quan, Zhi-Jie Wang, Tengfei Ma, Xiangxiang Zeng.

  5. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 2020. [Paper]

    P. Gainza, F. Sverrisson, F. Monti, E. Rodolà, D. Boscaini, M. M. Bronstein, B. E. Correia.

  6. Knowledge-Embedded Message-Passing Neural Networks: Improving Molecular Property Prediction with Human Knowledge. ACS Omega 2021. [Paper]

    Tatsuya Hasebe.

  7. SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization. Bioinform. 2021. [Paper]

    Yue Yu, Kexin Huang, Chao Zhang, Lucas M. Glass, Jimeng Sun, Cao Xiao.

  8. FraGAT: a fragment-oriented multi-scale graph attention model for molecular property prediction. Bioinform. 2021. [Paper]

    Ziqiao Zhang, Jihong Guan, Shuigeng Zhou.

  9. Equivariant message passing for the prediction of tensorial properties and molecular spectra. ICML 2021. [Paper]

    Kristof T. Schütt, Oliver T. Unke, Michael Gastegger.

  10. MDNN: A Multimodal Deep Neural Network for Predicting Drug-Drug Interaction Events. IJCAI 2021. [Paper]

    Tengfei Lyu, Jianliang Gao, Ling Tian, Zhao Li, Peng Zhang, Ji Zhang.

  11. MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph. KDD 2021. [Paper]

    Mengying Sun, Jing Xing, Huijun Wang, Bin Chen, Jiayu Zhou.

  12. Highly accurate protein structure prediction with AlphaFold. Nature 2021. [Paper]

    John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, Demis Hassabis.

  13. A unified drug–target interaction prediction framework based on knowledge graph and recommendation system. Nat. Commun. 2021. [Paper]

    Qing Ye, Chang-Yu Hsieh, Ziyi Yang, Yu Kang, Jiming Chen, Dongsheng Cao, Shibo He, Tingjun Hou.

  14. scGCN is a graph convolutional networks algorithm for knowledge transfer in single cell omics. Nat. Commun. 2021. [Paper]

    Qianqian Song, Jing Su, Wei Zhang.

  15. Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations. Nat. Mach. Intell. 2021. [Paper]

    Wan Xiang Shen, Xian Zeng, Feng Zhu, Ya li Wang, Chu Qin, Ying Tan, Yu Yang Jiang, Yu Zong Chen.

  16. GemNet: Universal Directional Graph Neural Networks for Molecules. NeurIPS 2021. [Paper]

    Johannes Gasteiger, Florian Becker, Stephan Günnemann.

  17. Multi-Scale Representation Learning on Proteins. NeurIPS 2021. [Paper]

    Vignesh Ram Somnath, Charlotte Bunne, Andreas Krause.

  18. Directional Message Passing on Molecular Graphs via Synthetic Coordinates. NeurIPS 2021. [Paper]

    Johannes Gasteiger, Chandan Yeshwanth, Stephan Günnemann.

  19. Molecular Contrastive Learning with Chemical Element Knowledge Graph. AAAI 2022. [Paper]

    Yin Fang, Qiang Zhang, Haihong Yang, Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, Zhuo Chen, Xiaohui Fan, Huajun Chen.

  20. Structured Multi-task Learning for Molecular Property Prediction. AISTATS 2022. [Paper]

    Shengchao Liu, Meng Qu, Zuobai Zhang, Huiyu Cai, Jian Tang.

  21. scGraph: a graph neural network-based approach to automatically identify cell types. Bioinform. 2022. [Paper]

    Qijin Yin, Qiao Liu, Zhuoran Fu, Wanwen Zeng, Boheng Zhang, Xuegong Zhang, Rui Jiang, Hairong Lv.

  22. DTI-HETA: prediction of drug–target interactions based on GCN and GAT on heterogeneous graph. Brief. Bioinform. 2022. [Paper]

    Kanghao Shao, Yunhao Zhang, Yuqi Wen, Zhongnan Zhang, Song He, Xiaochen Bo.

  23. PEMP: Leveraging Physics Properties to Enhance Molecular Property Prediction. CIKM 2022. [Paper]

    Yuancheng Sun, Yimeng Chen, Weizhi Ma, Wenhao Huang, Kang Liu, Zhiming Ma, Wei-Ying Ma, Yanyan Lan.

  24. Graph Neural Networks Pretraining Through Inherent Supervision for Molecular Property Prediction. CIKM 2022. [Paper]

    Roy Benjamin, Uriel Singer, Kira Radinsky.

  25. Pre-training Molecular Graph Representation with 3D Geometry. ICLR 2022. [Paper]

    Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, Jian Tang.

  26. OntoProtein: Protein Pretraining With Gene Ontology Embedding. ICLR 2022. [Paper]

    Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong, Shumin Deng, Qiang Zhang, Jiazhang Lian, Huajun Chen.

  27. Spherical Message Passing for 3D Graph Networks. ICLR 2022. [Paper]

    Yi Liu, Limei Wang, Meng Liu, Xuan Zhang, Bora Oztekin, Shuiwang Ji.

  28. 3D Infomax improves GNNs for Molecular Property Prediction. ICML 2022. [Paper]

    Hannes Stärk, Dominique Beaini, Gabriele Corso, Prudencio Tossou, Christian Dallago, Stephan Günnemann, Pietro Liò.

  29. DRPreter: Interpretable Anticancer Drug Response Prediction Using Knowledge-Guided Graph Neural Networks and Transformer. Int. J. Mol. Sci. 2022. [Paper]

    Jihye Shin, Yinhua Piao, Dongmin Bang, Sun Kim, Kyuri Jo.

  30. DENVIS: Scalable and High-Throughput Virtual Screening Using Graph Neural Networks with Atomic and Surface Protein Pocket Features. J. Chem. Inf. Model. 2022. [Paper]

    Agamemnon Krasoulis, Nick Antonopoulos, Vassilis Pitsikalis, Stavros Theodorakis.

  31. ReLMole: Molecular Representation Learning Based on Two-Level Graph Similarities. J. Chem. Inf. Model. 2022. [Paper]

    Zewei Ji, Runhan Shi, Jiarui Lu, Fang Li, and Yang Yang.

  32. KPGT: Knowledge-Guided Pre-training of Graph Transformer for Molecular Property Prediction. KDD 2022. [Paper]

    Han Li, Dan Zhao, Jianyang Zeng.

  33. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 2022. [Paper]

    Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E Smidt, Boris Kozinsky..

  34. Geometry-enhanced molecular representation learning for property prediction. Nat. Mach. Intell. 2022. [Paper]

    Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, Haifeng Wang.

  35. ComENet: Towards Complete and Efficient Message Passing for 3D Molecular Graphs. NeurIPS 2022. [Paper]

    Limei Wang, Yi Liu, Yuchao Lin, Haoran Liu, Shuiwang Ji.

  36. Knowledge-guided deep learning models of drug toxicity improve interpretation. Patterns 2022. [Paper]

    Yun Hao, Joseph D Romano, Jason H Moore.

  37. Robust deep learning–based protein sequence design using ProteinMPNN. Science 2022. [Paper]

    J. Dauparas, I. Anishchenko, N. Bennett, H. Bai, R. J. Ragotte, L. F. Milles, B. I. M. Wicky, A. Courbet, R. J. de Haas, N. Bethel, P. J. Y. Leung, T. F. Huddy, S. Pellock, D. Tischer, F. Chan, B. Koepnick, H. Nguyen, A. Kang, B. Sankaran, A. K. Bera, N. P. King, D. Baker.

  38. A Knowledge-Enhanced Multi-View Framework for Drug-Target Interaction Prediction. TKDE 2022. [Paper]

    Ying Shen, Yilin Zhang, Kaiqi Yuan, Dagang Li, Haitao Zheng.

  39. KG-MTL: Knowledge Graph Enhanced Multi-Task Learning for Molecular Interaction. TKDE 2022. [Paper]

    Tengfei Ma, Xuan Lin, Bosheng Song, Philip S. Yu, Xiangxiang Zeng.

  40. Hierarchical graph learning for protein-protein interaction. Nat. Commun. 2023. [Paper]

    Ziqi Gao, Chenran Jiang, Jiawen Zhang, Xiaosen Jiang, Lanqing Li, Peilin Zhao, Huanming Yang, Yong Huang, Jia Li.

  1. Neural Message Passing for Quantum Chemistry. ICML 2017. [Paper]

    Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl.

  2. Analyzing Learned Molecular Representations for Property Prediction. JCIM 2019. [Paper]

    Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, Andrew Palmer, Volker Settels, Tommi Jaakkola, Klavs Jensen, Regina Barzilay.

  3. Communicative representation learning on attributed molecular graphs. IJCAI 2020. [Paper]

    Ying Song, Shuangjia Zheng, Zhangming Niu, Zhang-Hua Fu, Yutong Lu, Yuedong Yang.

  4. SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization. Bioinform. 2021. [Paper]

    Yue Yu, Kexin Huang, Chao Zhang, Lucas M. Glass, Jimeng Sun, Cao Xiao.

  5. Equivariant message passing for the prediction of tensorial properties and molecular spectra. ICML 2021. [Paper]

    Kristof T. Schütt, Oliver T. Unke, Michael Gastegger.

  6. MDNN: A Multimodal Deep Neural Network for Predicting Drug-Drug Interaction Events. IJCAI 2021. [Paper]

    Tengfei Lyu, Jianliang Gao, Ling Tian, Zhao Li, Peng Zhang, Ji Zhang.

  7. Highly accurate protein structure prediction with AlphaFold. Nature 2021. [Paper]

    John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, Demis Hassabis.

  8. Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations. Nat. Mach. Intell. 2021. [Paper]

    Wan Xiang Shen, Xian Zeng, Feng Zhu, Ya li Wang, Chu Qin, Ying Tan, Yu Yang Jiang, Yu Zong Chen.

  9. GemNet: Universal Directional Graph Neural Networks for Molecules. NeurIPS 2021. [Paper]

    Johannes Gasteiger, Florian Becker, Stephan Günnemann.

  10. Spherical Message Passing for 3D Graph Networks. ICLR 2022. [Paper]

    Yi Liu, Limei Wang, Meng Liu, Xuan Zhang, Bora Oztekin, Shuiwang Ji.

  11. DRPreter: Interpretable Anticancer Drug Response Prediction Using Knowledge-Guided Graph Neural Networks and Transformer. Int. J. Mol. Sci. 2022. [Paper]

    Jihye Shin, Yinhua Piao, Dongmin Bang, Sun Kim, Kyuri Jo.

  12. DENVIS: Scalable and High-Throughput Virtual Screening Using Graph Neural Networks with Atomic and Surface Protein Pocket Features. J. Chem. Inf. Model. 2022. [Paper]

    Agamemnon Krasoulis, Nick Antonopoulos, Vassilis Pitsikalis, Stavros Theodorakis.

  13. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 2022. [Paper]

    Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E Smidt, Boris Kozinsky..

  14. Geometry-enhanced molecular representation learning for property prediction. Nat. Mach. Intell. 2022. [Paper]

    Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, Haifeng Wang.

  15. Spherical Message Passing for 3D Graph Networks. ICLR 2022. [Paper]

    Yi Liu, Limei Wang, Meng Liu, Xuan Zhang, Bora Oztekin, Shuiwang Ji.

  16. Robust deep learning–based protein sequence design using ProteinMPNN. Science 2022. [Paper]

    J. Dauparas, I. Anishchenko, N. Bennett, H. Bai, R. J. Ragotte, L. F. Milles, B. I. M. Wicky, A. Courbet, R. J. de Haas, N. Bethel, P. J. Y. Leung, T. F. Huddy, S. Pellock, D. Tischer, F. Chan, B. Koepnick, H. Nguyen, A. Kang, B. Sankaran, A. K. Bera, N. P. King, D. Baker.

  17. A Knowledge-Enhanced Multi-View Framework for Drug-Target Interaction Prediction. TKDE 2022. [Paper]

    Ying Shen, Yilin Zhang, Kaiqi Yuan, Dagang Li, Haitao Zheng.

  1. KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction. IJCAI 2020. [Paper]

    Xuan Lin, Zhe Quan, Zhi-Jie Wang, Tengfei Ma, Xiangxiang Zeng.

  2. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 2020. [Paper]

    P. Gainza, F. Sverrisson, F. Monti, E. Rodolà, D. Boscaini, M. M. Bronstein, B. E. Correia.

  3. FraGAT: a fragment-oriented multi-scale graph attention model for molecular property prediction. Bioinform. 2021. [Paper]

    Ziqiao Zhang, Jihong Guan, Shuigeng Zhou.

  4. MDNN: A Multimodal Deep Neural Network for Predicting Drug-Drug Interaction Events. IJCAI 2021. [Paper]

    Tengfei Lyu, Jianliang Gao, Ling Tian, Zhao Li, Peng Zhang, Ji Zhang.

  5. A unified drug–target interaction prediction framework based on knowledge graph and recommendation system. Nat. Commun. 2021. [Paper]

    Qing Ye, Chang-Yu Hsieh, Ziyi Yang, Yu Kang, Jiming Chen, Dongsheng Cao, Shibo He, Tingjun Hou.

  6. scGCN is a graph convolutional networks algorithm for knowledge transfer in single cell omics. Nat. Commun. 2021. [Paper]

    Qianqian Song, Jing Su, Wei Zhang.

  7. Multi-Scale Representation Learning on Proteins. NeurIPS 2021. [Paper]

    Vignesh Ram Somnath, Charlotte Bunne, Andreas Krause.

  8. Structured Multi-task Learning for Molecular Property Prediction. AISTATS 2022. [Paper]

    Shengchao Liu, Meng Qu, Zuobai Zhang, Huiyu Cai, Jian Tang.

  9. DTI-HETA: prediction of drug–target interactions based on GCN and GAT on heterogeneous graph. Brief. Bioinform. 2022. [Paper]

    Kanghao Shao, Yunhao Zhang, Yuqi Wen, Zhongnan Zhang, Song He, Xiaochen Bo.

  10. DRPreter: Interpretable Anticancer Drug Response Prediction Using Knowledge-Guided Graph Neural Networks and Transformer. Int. J. Mol. Sci. 2022. [Paper]

    Jihye Shin, Yinhua Piao, Dongmin Bang, Sun Kim, Kyuri Jo.

  11. ReLMole: Molecular Representation Learning Based on Two-Level Graph Similarities. J. Chem. Inf. Model. 2022. [Paper]

    Zewei Ji, Runhan Shi, Jiarui Lu, Fang Li, and Yang Yang.

  12. Hierarchical graph learning for protein-protein interaction. Nat. Commun. 2023. [Paper]

    Ziqi Gao, Chenran Jiang, Jiawen Zhang, Xiaosen Jiang, Lanqing Li, Peilin Zhao, Huanming Yang, Yong Huang, Jia Li.

  1. Knowledge-Embedded Message-Passing Neural Networks: Improving Molecular Property Prediction with Human Knowledge. ACS Omega 2021. [Paper]

    Tatsuya Hasebe.

  2. PEMP: Leveraging Physics Properties to Enhance Molecular Property Prediction. CIKM 2022. [Paper]

    Yuancheng Sun, Yimeng Chen, Weizhi Ma, Wenhao Huang, Kang Liu, Zhiming Ma, Wei-Ying Ma, Yanyan Lan.

  3. Pre-training Molecular Graph Representation with 3D Geometry. ICLR 2022. [Paper]

    Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, Jian Tang.

  4. OntoProtein: Protein Pretraining With Gene Ontology Embedding. ICLR 2022. [Paper]

    Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong, Shumin Deng, Qiang Zhang, Jiazhang Lian, Huajun Chen.

  5. 3D Infomax improves GNNs for Molecular Property Prediction. ICML 2022. [Paper]

    Hannes Stärk, Dominique Beaini, Gabriele Corso, Prudencio Tossou, Christian Dallago, Stephan Günnemann, Pietro Liò.

  6. Geometry-enhanced molecular representation learning for property prediction. Nat. Mach. Intell. 2022. [Paper]

    Xiaomin Fang, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, Haifeng Wang.

  1. Graph Neural Networks Pretraining Through Inherent Supervision for Molecular Property Prediction. CIKM 2022. [Paper]

    Roy Benjamin, Uriel Singer, Kira Radinsky.

  2. KPGT: Knowledge-Guided Pre-training of Graph Transformer for Molecular Property Prediction. KDD 2022. [Paper]

    Han Li, Dan Zhao, Jianyang Zeng.

  1. MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph. KDD 2021. [Paper]

    Mengying Sun, Jing Xing, Huijun Wang, Bin Chen, Jiayu Zhou.

  2. Molecular Contrastive Learning with Chemical Element Knowledge Graph. AAAI 2022. [Paper]

    Yin Fang, Qiang Zhang, Haihong Yang, Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, Zhuo Chen, Xiaohui Fan, Huajun Chen.

  1. Highly accurate protein structure prediction with AlphaFold. Nature 2021. [Paper]

    John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, Demis Hassabis.

  2. PEMP: Leveraging Physics Properties to Enhance Molecular Property Prediction. CIKM 2022. [Paper]

    Yuancheng Sun, Yimeng Chen, Weizhi Ma, Wenhao Huang, Kang Liu, Zhiming Ma, Wei-Ying Ma, Yanyan Lan.

  3. KG-MTL: Knowledge Graph Enhanced Multi-Task Learning for Molecular Interaction. TKDE 2022. [Paper]

    Tengfei Ma, Xuan Lin, Bosheng Song, Philip S. Yu, Xiangxiang Zeng.

  1. Knowledge-Embedded Message-Passing Neural Networks: Improving Molecular Property Prediction with Human Knowledge. ACS Omega 2021. [Paper]

    Tatsuya Hasebe.

  1. SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization. Bioinform. 2021. [Paper]

    Yue Yu, Kexin Huang, Chao Zhang, Lucas M. Glass, Jimeng Sun, Cao Xiao.

  1. DRPreter: Interpretable Anticancer Drug Response Prediction Using Knowledge-Guided Graph Neural Networks and Transformer. Int. J. Mol. Sci. 2022. [Paper]

    Jihye Shin, Yinhua Piao, Dongmin Bang, Sun Kim, Kyuri Jo.

  2. Knowledge-guided deep learning models of drug toxicity improve interpretation. Patterns 2022. [Paper]

    Yun Hao, Joseph D Romano, Jason H Moore.

Molecular and Structural
  1. logP. Type: Type. [Paper]

    Measures of a molecule’s hydrophobicity, or its partition coefficient between a nonpolar and polar solvent, and is commonly used to predict drug absorption and distribution.

  2. rotatable bond. Type: Type. [Paper]

    Annotation of the (non)rotatable bond.

  3. MolMap. Type: Software. [Paper]

    A method to visualise molecular structures in 3D by mapping atomic properties onto a 3D grid, allowing for the exploration and analysis of molecular interactions and properties.

  4. RDKit. Type: Software. [Paper]

    An open-source package to generate chemical features.

  5. UFF. Type: Table. [Paper]

    A molecular mechanics force field designed for the full periodic table.

  6. Mordred. Type: Software. [Paper]

    A tool for generating molecular descriptors, which are mathematical representations of molecular structures used for molecular property analysis.

  7. OpenBabel. Type: Software. [Paper]

    An open-source molecular modelling software that provides a comprehensive toolkit for molecular conversion, visualisation, and analysis.

  8. MoleculeNet. Type: Database. [Paper]

    A benchmark for molecular machine learning, comparing models performances on various molecular property prediction tasks such as solubility, melting point, and binding affinity.

  9. Ptable. Type: Table. Resource

    A periodic table of chemical elements classified by atomic number, electron configurations, and chemical properties into groups and periods, providing a systematic overview of elements.

Compounds
  1. CheMBL. Type: Database. [Paper]

    A database of bioactive molecules, assays, and potency information for drug discovery and pharmaceutical research, used to facilitate target identification and selection.

  2. PubChem. Type: Database. [Paper]

    Open database of chemical substances that contains information on their 2D and 3D structures, identifiers, properties, biological activities and occurrence in nature.

  3. ChEBI. Type: Ontology, Database. [Paper]

    An open-source resource for molecular biology and biochemistry, providing a systematic and standardised vocabulary of molecular entities focused on small chemical compounds.

  4. KEGG Compound. Type: Database. [Paper]

    A database of small molecular compounds, including their structures, reactions, pathways, and functions, used to provide information on metabolic pathways and cellular processes.

  5. DrugBank. Type: Database. [Paper]

    A database includes small molecular compounds, biologics, and natural products, providing information on their properties, mechanisms, and interactions used in drug discovery.

Drugs and Targets
  1. DDinter. Type: Database. [Paper]

    A database of protein-protein interactions, providing information on protein targets, their interactions, and related diseases, used to advance drug discovery and development.

  2. TCRD. Type: Database. [Paper]

    Database that aggregates information on proteins targeted by drugs and attributes them a development/druggability level.

  3. OpenTargets. Type: Database. [Paper]

    A database that integrates diverse genomic and molecular data to provide a comprehensive view of the relationships between diseases, genes, and molecular targets.

  4. TTD. Type: Database. [Paper]

    A publicly available database that provides information on protein and nucleic acid targets, drugs that target them and related diseases, used to advance drug discovery and development.

  5. PharmGKB. Type: Database. [Paper]

    A resource that provides information on the impact of human genetic variation on drug response, used to advance precision and personalised drug therapy.

  6. e-TSN. Type: Web platform. [Paper]

    A platform that integrates knowledge on disease-target associations used for target identification. These associations were extracted from literature by using NLP techniques.

  7. nSIDES. Type: Database. [Paper]

    Multiple resources made available by the Tatonetti lab on drug side effects, drug-drug interactions and pediatric drug safety.

  8. SIDER. Type: Database. [Paper]

    A database of marketed drugs and their side effects, providing information on the frequency, type, and severity of adverse events, used to advance drug safety and pharmacovigilance.

Genes and Proteins
  1. GeneOntology. Type: Ontology. [Paper]

    A structured and standardised ontology of gene functions, used to describe and categorise genes and gene products function in a consistent and interoperable manner.

  2. Entrez. Type: Database. [Paper]

    A database that includes nucleotide and protein sequences, genomic maps, taxonomy, and chemical compounds by referencing other databases, used to query various biomedical data.

  3. Ensembl. Type: Database. [Paper]

    A database that provides information on annotated genes, multiple sequence alignments and disease for a variety of species, including humans.

  4. KEGG Genes. Type: Database. [Paper]

    A database that provides information on genes for complete genomes, their associated pathways, and functions in various organisms.

  5. BioGRID. Type: Database. [Paper]

    A database of protein and genetic interactions curated from high-throughput experimental data sources in a variety of organisms. It includes a tool to create graphs of interactions.

  6. UniProt. Type: Database. [Paper]

    A database of protein information, including their sequences, structure, structure and post-translational modifications.

  7. STRING. Type: Database. [Paper]

    A database of protein-protein interactions and functional associations, integrating diverse data sources and evidence to provide a weighted network of functional relationships.

  8. HumanNet. Type: Database. [Paper]

    Network of protein-protein and functional gene interactions, constructed by integrating high-throughput datasets and literature, used to advance understanding of disease gene prediction.

  9. STITCH. Type: Database. [Paper]

    A database of known and predicted interactions between chemicals and proteins (physical and functional associations), used for the study of molecular interactions.

  10. PDB. Type: Database. [Paper]

    A database that provides information on the 3D structure of proteins, nucleic acids, and complex molecular assemblies, obtained experimentally or predicted.

  11. RNAcentral. Type: Database. [Paper]

    A repository that integrates information on non-coding RNA sequences for a variety of organisms and attributes them to a unique identifier.

Pathways
  1. Reactome. Type: Database. [Paper]

    A database that stores and curates information about the molecular pathways in humans, providing insights into cellular processes and disease mechanisms.

  2. KEGG pathways. Type: Database. [Paper]

    A database of curated biological pathways and interconnections between them, manually represented as pathway maps of molecular reactions and interactions.

  3. WikiPathways. Type: Database. [Paper]

    A database of biological pathways that integrates information from several databases, which aims to provide an overview of molecular interactions and reactions.

Disease
  1. Disease Ontology. Type: Ontology. [Paper]

    Disease Ontology (DO) is an ontology of human disease that integrates MeSH, ICD, OMIM, NCI Thesaurus and SNOMED nomenclatures.

  2. MonDO. Type: Ontology. [Paper]

    Semi-automatic unifying terminology between different disease ontologies.

  3. Orphanet. Type: Database. Resource

    A database that maintains information on rare diseases and orphan drugs using cross-references to other commonly used ontologies.

  4. OMIM. Type: Database. [Paper]

    A comprehensive, searchable database of gene-disease associations for Mendelian disorders.

  5. KEGG Disease. Type: Database. [Paper]

    A database of disease entries that are characterised by their perturbants (genetic or environmental factors, drugs, and pathogens).

  6. ICD-11. Type: Ontology. [Paper]

    The 11th version of the international resource for recording health and clinical data in a standardised format that is constantly updated.

  7. Disgenet. Type: Database. [Paper]

    A database that integrates manually curated data from GWAS studies, animal models, and scientific literature to identify gene-disease associations. It can be used for target identification and prioritisation.

  8. DISEASES. Type: Database. [Paper]

    A database for disease-gene associations based on manually curated data, cancer mutation data, GWAS, and automatic text mining.

  9. GWAS Catalog. Type: Database. [Paper]

    Repository of published Genome-Wide Association Studies (GWAS) for investigating the impact of genomic variants on complex diseases.

  10. SemMedDB. Type: Database. [Paper]

    A database that provides information on the relationships between genes and diseases, extracted from the biomedical literature.

  11. OncoKB. Type: Database. [Paper]

    A knowledge precision database containing information on human genetic alterations detected in different cancer types.

  12. HPO. Type: Ontology. [Paper]

    The Human Phenotype Ontology (HPO) is an ontology of human phenotypes and database of disease-phenotype associations with cross-references to other relevant databases.

Medical Terms and Anatomy
  1. Uberon. Type: Ontology. [Paper]

    A multi-species anatomy ontology. It covers various anatomical systems for organs and tissues.

  2. BRENDA. Type: Ontology. [Paper]

    A tissue ontology for enzyme source comprising tissues, cell lines, cell types and cultures.

  3. TISSUES. Type: Database. [Paper]

    A database for gene expression in tissues that contains manually curated knowledge, proteomics, transcriptomics, and automatic text mining. Annotated with BRENDA tissue ontology.

  4. MeSH. Type: Vocabulary. [Paper]

    A comprehensive controlled vocabulary used for biomedical and health-related information.

  5. UMLS. Type: Ontology. [Paper]

    A biomedical terminologies and ontologies database that integrates and harmonises data from a variety of sources to support clinical documentation and research in healthcare.

  1. Hetionet. Intended Usage: Drug discovery, Drug repurposing, etc. [Paper]

    An integrated KG of more than 12,000 nodes representing various biological, medical and social entities and their relationships. It is a valuable resource combining many different databases that can be used for drug discovery and repurposing.

  2. PharmKG. Intended Usage: Drug discovery. [Paper]

    A comprehensive biomedical KG integrating information from various databases, literature, and experiments. It is mainly centered around interactions between genes, diseases and drugs.

  3. DRKG. Intended Usage: Drug repurposing. [Paper]

    A large-scale, cross-domain KG that integrates information about drugs, proteins, diseases, and chemical compounds. It is based on Hetionet, and it was used for drug repurposing for Covid-19.

  4. CKG. Intended Usage: Biomarker discovery, Drug prioritisation. [Paper]

    A KG developed for precision medicine that combines various databases and integrates clinical and omics data. It allows for automated upload and integration of new omics data with pre-existing knowledge.

  5. OpenBioLink. Intended Usage: Drug discovery. [Paper]

    An open-source KG that integrates diverse biomedical data from various databases. It was developed to enable benchmarking of ML algorithms.

  6. BioKG. Intended Usage: Pathway discovery, Drug discovery. [Paper]

    A KG that integrates information about genes, proteins, diseases, drugs, and other biological entities. It aims at providing a standardised KG in a unified format with stable IDs.

  7. Bioteque. Intended Usage: Broad usage. [Paper]

    A KG that enables the discovery of relationships between genes, proteins, diseases, drugs, and other entities, providing an overview of biological knowledge for use in biomedical research and personalised medicine.

  8. Harmonizome. Intended Usage: Drug discovery, Precision medicine. [Paper]

    A KG that focuses on gene- and protein-centric information and their interactions. It provides a unified view of biological knowledge and enables the discovery of new insights fin the biomedical field.