/LLM4Mol

A comprehensive repository dedicated to the collection and exploration of studies utilizing Large Language Models for molecular design, protein research, and material science.

MIT LicenseMIT

LLM4Mol

LLM(Large Language Model)4Mol is a comprehensive repository dedicated to the collection and exploration of studies utilizing large language models for molecular design, protein research, and material science. This repository serves as a central hub for researchers, scientists, and enthusiasts interested in leveraging the power of language models for advancing our understanding and applications in these domains. Discover state-of-the-art techniques, novel approaches, and cutting-edge research papers that harness the potential of AI-powered language models in unraveling the complexities of Biomedical Text, RNA/DNA, Molecules, Peptides, Proteins, Antibody, and Materials. Join our vibrant community and contribute to the exciting advancements in the field of LLM4Mol!

đź””Updating ...

Recommendations and references

Generative AI and Deep Learning for molecular/drug design
https://github.com/AspirinCode/papers-for-molecular-design-using-DL

List of papers about Proteins Design using Deep Learning
https://github.com/Peldom/papers_for_protein_design_using_DL

Large Language Models in Chemistry
https://github.com/alxfgh/Large-Language-Models-in-Chemistry

Menu

LLM4Biomedical Text

  • Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health [2023]
    Tian, Shubo, Qiao Jin, Lana Yeganova, Po-Ting Lai, Qingqing Zhu, Xiuying Chen, Yifan Yang et al.
    arXiv:2306.10070 (2023)

  • Large language models are universal biomedical simulators [2023]
    Schaefer, Moritz, Stephan Reichl, Rob ter Horst, Adele M. Nicolas, Thomas Krausgruber, Francesco Piras, Peter Stepper, Christoph Bock, and Matthias Samwald.
    bioRxiv (2023) | code

  • Fine-tuning large neural language models for biomedical natural language processing [2023]
    Tinn, Robert, Hao Cheng, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon.
    Patterns 4.4 (2023) | code

  • A Platform for the Biomedical Application of Large Language Models [2023]
    Lobentanzer, Sebastian, and Julio Saez-Rodriguez.
    arXiv:2305.06488v2 | code

  • Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations [2023]
    Chen, Qingyu, Jingcheng Du, Yan Hu, Vipina Kuttichi Keloth, Xueqing Peng, Kalpana Raja, Rui Zhang, Zhiyong Lu, and Hua Xu.
    arXiv:2305.16326v1 | code

  • BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks [2023]
    Zhang, K., Yu, J., Yan, Z., Liu, Y., Adhikarla, E., Fu, S., ... & Sun, L.
    arXiv:2305.17100v1 | code

  • BioMedLM: a Domain-Specific Large Language Model for Biomedical Text [2022]
    Paper | code

LLM4Small Molecule

  • Empowering Molecule Discovery for Molecule-Caption Translation with LargeLanguage Models: A ChatGPT Perspective [2023]
    Jiatong Li, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, Qing Li
    arXiv:2306.06615 (2023) | code

  • Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language [2023]
    Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen
    arXiv:2303.03363 (2023) | code

  • Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models [2023]
    Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen
    arXiv:2306.08018v1 | code

  • MolReGPT: Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [2023]
    Li, Jiatong, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, and Qing Li.
    arXiv:2306.06615v1 | code

LLM4RNA/DNA

  • HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution [2023]
    Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Chris RĂ©.
    arXiv:2306.15794v1

  • DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome [2021]
    Ji, Yanrong, Zhihan Zhou, Han Liu, and Ramana V. Davuluri.
    Bioinformatics 37.15 (2021) | code

LLM4Peptide

LLM4Protein

  • Protein-Protein Interaction Prediction is Achievable with Large Language Models [2023]
    Hallee, Logan, and Jason P. Gleghorn.
    bioRxiv (2023)

  • Prediction of virus-host association using protein language models and multiple instance learning [2023]
    Liu, Dan, Francesca Young, David L. Robertson, and Ke Yuan.
    bioRxiv (2023) | code

  • Large language models generate functional protein sequences across diverse families [2023]
    Madani, Ali, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos Jr et al.
    Nat Biotechnol (2023) | code

LLM4Antibody

  • On Pre-training Language Model for Antibody [2023]
    Wang, Danqing, Y. E. Fei, and Hao Zhou.
    ICLR (2023) | code

  • Efficient evolution of human antibodies from general protein language models [2023]
    Hie, Brian L., Varun R. Shanker, Duo Xu, Theodora UJ Bruun, Payton A. Weidenbacher, Shaogeng Tang, Wesley Wu, John E. Pak, and Peter S. Kim.
    Nat Biotechnol (2023) | code

  • AbLang: an antibody language model for completing antibody sequences [2022]
    Olsen, Tobias H., Iain H. Moal, and Charlotte M. Deane.
    Bioinformatics Advances (2022) | code

LLM4Clinical

  • Matching Patients to Clinical Trials with Large Language Models [2023]
    Jin, Qiao, Zifeng Wang, Charalampos S. Floudas, Jimeng Sun, and Zhiyong Lu.
    arXiv:2307.15051 (2023)

  • ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation [2023]
    Wang, Danqing, Y. E. Fei, and Hao Zhou.
    arXiv:2306.09968v1

LLM4Chemistry

  • ChemCrow: Augmenting large-language models with chemistry tools [2023]
    Bran, Andres M., Sam Cox, Andrew D. White, and Philippe Schwaller.
    arXiv:2304.05376 (2023) | code

LLM4Material

  • Large Language Models as Master Key: Unlocking the Secrets of Materials Science with GPT [2023]
    Xie, Tong, Yuwei Wa, Wei Huang, Yufei Zhou, Yixuan Liu, Qingyuan Linghu, Shaozhou Wang, Chunyu Kit, Clara Grazian, and Bram Hoex.
    arXiv:2304.02213v5

  • MatSciBERT: A materials domain language model for text mining and information extraction [2022]
    Gupta, Tanishq, Mohd Zaki, NM Anoop Krishnan, and Mausam.
    npj Comput Mater 8, 102 (2022) | code