Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review

🔔 News

  • 💥 [2023/11/06] Our review paper is available at here.
  • ✨ [2023/11/03] We create this repository to maintain a paper list on Large Language Models (LLMs) in Medicine.

Introduction

In the fast-evolving landscape of artificial intelligence, Large Language Models (LLMs) have emerged as groundbreaking tools with the potential to emulate complex human linguistic abilities. Their profound impact on healthcare, a field at the crossroads of multifaceted data and intricate decision-making, is of immense interest. This repository delves into the integration challenges and showcases the breadth of LLMs' applications within the medical sphere.

Herein, we offer a curated anthology that navigates through the realm of general-purpose and specialized LLMs, elucidating their roles in enhancing medical research, streamlining clinical operations, and supporting diagnostic processes. We cast a spotlight on multimodal LLMs, championing their sophistication in harmonizing varied data streams such as medical imagery and electronic health records (EHRs) to refine diagnostic precision. Advancing into the frontiers of innovation, we explore LLM-empowered autonomous healthcare agents, scrutinizing their capacity for personalized care and intricate clinical reasoning. Additionally, we present a synthesis of evaluative strategies critical for verifying the dependability and security of LLMs within medical settings.

Our extensive analysis sheds light on the transformative promise LLMs hold for healthcare's future. Yet, we underscore the indispensable call for ongoing refinement and ethical vigilance as precursors to their successful clinical integration.

Please note: This repository's scope is centered on the technological evolution of LLMs in medicine. For insights into clinical deployments and applications of LLMs, we invite you to consult our comprehensive review.

We sincerely value all contributions, whether through pull requests, issue reports, emails, or other forms of communication.

Table of Content (ToC)

Evaluating General-Purpose LLMs in Medicine via Prompting

  • [2023/11] Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine Harsha Nori et al. arXiv. [paper]
  • [2023/10] Exploring the Boundaries of GPT-4 in Radiology Liu et al. EMNLP 2023 main. [paper]
  • [2023/08] Evaluating large language models on medical evidence summarization. Liyan Tang et al. npj Digital Medicine. [paper]
  • [2023/07] Evaluating Large Language Models for Radiology Natural Language Processing. Zhengliang Liu et al. arXiv. [paper]
  • [2023/07] Advanced prompting as a catalyst: Empowering large language models in the management of gastrointestinal cancers Jiajia Yuan et al. The Innovation Medicine. [paper]
  • [2023/04] Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding. Yuqing Wang et al. arXiv. [paper]

Specialized Medical LLMs

  • [2023/12] Towards Accurate Differential Diagnosis with Large Language Models Daniel McDuff et al. arXiv. [paper]
  • [2023/11] MEDITRON-70B: Scaling Medical Pretraining for Large Language Models Zeming Chen et al. arXiv. [paper][code]
  • [2023/11] Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks Ling Luo et al. arXiv. [paper][code]
  • [2023/11] HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs Junying Chen et al. arXiv. [paper][code]
  • [2023/10] AlpaCare:Instruction-tuned Large Language Models for Medical Application Zhang et al. arXiv. [paper] [code]
  • [2023/10] ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data Zhong et al. arXiv. [paper]
  • [2023/10] Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes Sunjun Kweon et al. arXiv. [paper] [code]
  • [2023/09] HealGPT GR-Tech. [code] [demo]
  • [2023/09] MedChatZH: a Better Medical Adviser Learns from Better Instructions Tan et al. arXiv. [paper] [code]
  • [2023/09] CPLLM: Clinical Prediction with Large Language Models Shoham et al. arXiv. [paper]
  • [2023/09] Radiology-Llama2: Best-in-Class Large Language Model for Radiology Liu et al. arXiv. [paper]
  • [2023/08] Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue Songhua Yang et al. arXiv. [paper] [code]
  • [2023/08] DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation Zhijie Bao et al. arXiv. [paper] [code]
  • [2023/08] CareGPT: Medical LLM, Open Source Driven for a Healthy Future Rongsheng Wang et al. [code]
  • [2023/07] HuangDi: A Generative Large Language Model for Ancient Chinese Medical Texts Jundong Zhang et al. [code]
  • [2023/07] MING: A Chinese Medical Consultation Large Model Yusheng Liao et al. [code]
  • [2023/06] TCMLLM Xuezhong Zhou et al. [code]
  • [2023/06] PULSE OpenMedLab. [code]
  • [2023/06] Sunsimiao: Chinese Medicine LLM Xin Yan et al. [code]
  • [2023/06] ShenNong-TCM: A Traditional Chinese Medicine Large Language Model Wei Zhu et al. [code]
  • [2023/06] Radiology-GPT: A Large Language Model for Radiology Zhengliang Liu et al. arXiv. [paper] [code]
  • [2023/06] MedicalGPT: Training Medical GPT Model Ming Xu et al. [code]
  • [2023/06] ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation Guangyu Wang et al. arXiv. [paper]
  • [2023/05] CAMEL: Clinically Adapted Model Enhanced from LLaMA Sunjun Kweon et al. [code] [blog]
  • [2023/05] Clinfo.AI: Answer Clinical Questions Grounded in Medical Literature Alejandro Lozano et al. [code]
  • [2023/05] Towards Expert-Level Medical Question Answering with Large Language Models Karan Singhal et al. arXiv. [paper]
  • [2023/05] CMLM-ZhongJing: Large Language Model is Good Story Listener Yanlan Kang et al. [code]
  • [2023/05] QiZhenGPT: An Open Source Chinese Medical Large Language Model Yao Chang et al. [code]
  • [2023/05] HuatuoGPT, towards Taming Language Model to Be a Doctor Hongbo Zhang et al. arXiv. [paper] [code]
  • [2023/05] Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding Toma et al. arXiv. [paper] [code]
  • [2023/04] BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT Yirong Chen et al. arXiv. [paper] [code]
  • [2023/04] ChatMed: A Chinese Medical Large Language Model Wei Zhu et al. [code]
  • [2023/04] DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task Honglin Xiong et al. arXiv. [paper] [code]
  • [2023/04] PMC-LLaMA: Towards Building Open-source Language Models for Medicine Chaoyi Wu et al. arXiv. [paper] [code]
  • [2023/04] HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge Haochun Wang et al. arXiv. [paper] [code]
  • [2023/04] Doctor Dignity Siraj Raval et al. [code]
  • [2023/04] MedAlpaca--An Open-Source Collection of Medical Conversational AI Models and Training Data Tianyu Han et al. arXiv. [paper] [code]
  • [2023/03] Palmyra-Large Parameter Autoregressive Language Model Writer Engineering team. [code]
  • [2023/03] ChatGLM-Med Haochun Wang et al. [code]
  • [2023/03] ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge Yunxiang Li et al. Cureus. [paper] [code]
  • [2022/12] A large language model for electronic health records Xi Yang et al. npj Digital Medicine. [paper] [code]
  • [2022/12] Large language models encode clinical knowledge Karan Singhal et al. Nature. [paper] [code]
  • [2022/10] Health system-scale language models are all-purpose prediction engines Lavender Yao Jiang et al. Nature. [paper] [code]

Multimodal LLMs in Medicine

  • [2023/12] A Foundational Multimodal Vision Language AI Assistant for Human Pathology Ming Y. Liu et al. arXiv, [paper]
  • [2023/10] Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare Junling Liu et al. arXiv. [paper] [code]
  • [2023/08] ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders Shawn Xu et al. arXiv. [paper]
  • [2023/08] Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data Chaoyi Wu et al. arXiv. [paper] [code]
  • [2023/08] BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine Yizhen Luo et al. arXiv. [paper] [code]
  • [2023/07] Multimodal LLMs for health grounded in individual-specific data Anastasiya Belyaeva et al. arXiv. [paper] [blog]
  • [2023/07] Med-Flamingo: a Multimodal Medical Few-shot Learner Michael Moor et al. arXiv. [paper] [code]
  • [2023/07] Towards Generalist Biomedical AI Tao Tu et al. arXiv. [paper] [code]
  • [2023/07] CephGPT-4: An Interactive Multimodal Cephalometric Measurement and Diagnostic System with Visual Large Language Model Lei Ma et al. arXiv. [paper]
  • [2023/06] Lmflow: An extensible toolkit for finetuning and inference of large foundation models Shizhe Diao et al. arXiv. [code] [paper] [blog]
  • [2023/06] XrayPULSE OpenMedLab. [code]
  • [2023/06] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models Omkar Thawkar et al. arXiv. [paper] [code]
  • [2023/06] LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day Chunyuan Li et al. arXiv. [paper] [code]
  • [2023/05] PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering Xiaoman Zhang et al. arXiv. [paper] [code]
  • [2023/05] BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks Kai Zhang et al. arXiv. [paper] [code]
  • [2023/05] PathAsst: Redefining Pathology through Generative Foundation AI Assistant for Pathology Yuxuan Sun et al. arXiv. [paper] [code]
  • [2023/05] XrayGLM: The first Chinese Medical Multimodal Model that Chest Radiographs Summarization Rongsheng Wang et al. [code]
  • [2023/04] SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model Juexiao Zhou et al. arXiv. [paper] [code]
  • [2023/04] Visual Med-Alpaca: A Parameter-Efficient Biomedical LLM with Visual Capabilities Chang Shu et al. [blog] [code]

GPT-4V

  • [2023/12] Enhancing Medical Task Performance in GPT-4V: A Comprehensive Study on Prompt Engineering Strategies Pengcheng Chen et al. arXiv, [paper]
  • [2023/11] Performance of Multimodal GPT-4V on USMLE with Image: Potential for Imaging Diagnostic Support with Explanations Yang et al. medRxiv. [paper]
  • [2023/10] Diagnostic Accuracy of GPT Multimodal Analysis on USMLE Questions Including Text and Visuals Sorin et al. medRxiv. [paper]
  • [2023/10] Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V Yan et al. arXiv. [paper]
  • [2023/10] A Comprehensive Study of GPT-4V's Multimodal Capabilities in Medical Imaging Li et al. arXiv. [paper]
  • [2023/10] Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis Wu et al. arXiv. [paper]
  • [2023/09] The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) Yang et al. arXiv. [paper]

LLM-Powered Healthcare Agents

  • [2023/11] MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning Tang et al. arXiv. [paper] [code]
  • [2023/10] Conversational Health Agents: A Personalized LLM-Powered Agent Framework Mahyar Abbasian et al. arXiv. [paper]
  • [2023/07] PharmacyGPT: The AI Pharmacist Zhengliang Liu et al. arXiv. [paper]
  • [2023/07] Advanced prompting as a catalyst: Empowering large language models in the management of gastrointestinal cancers Jiajia Yuan et al. The Innovation Medicine. [paper]
  • [2023/06] AD-AutoGPT: An Autonomous GPT for Alzheimer's Disease Infodemiology Haixing Dai et al. arXiv. [paper]
  • [2023/05] ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs Zihao Zhao et al. arXiv. [paper] [code]
  • [2023/05] PathAsst: Redefining Pathology through Generative Foundation AI Assistant for Pathology Yuxuan Sun et al. arXiv. [paper] [code]
  • [2023/04] ImpressionGPT: An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT Chong Ma et al. arXiv. [paper] [code]
  • [2023/04] GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information Jin et al. arXiv. [paper]
  • [2023/03] Almanac: Retrieval-Augmented Language Models for Clinical Medicine Zakka et al. Research Square. [paper]

Evaluation

Strategies

  • [2023/09] An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models Liao et al. arXiv. [paper]
  • [2023/09] Safetybench: Evaluating the safety of large language models with multiple choice questions. Zhexin Zhang et al. arXiv. [paper] [code]
  • [2023/08] LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation Xiaoming Shi et al. arXiv. [paper]
  • [2023/07] Med-HALT: Medical Domain Hallucination Test for Large Language Models. Ankit Pal et al. EMNLP'23. [paper] [code]
  • [2023/06] Faithful AI in Medicine: A Systematic Review with Large Language Models and Beyond. Qianqian Xie et al. medRxiv. [paper]
  • [2023/05] MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine Jie Xu et al. arXiv. [paper]
  • [2023/05] Can Large Language Models Be an Alternative to Human Evaluations? Cheng-Han Chiang et al. ACL'23. [paper]
  • [2023/04] Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: An Empirical Study Yi Chen et al. arXiv. [paper]
  • [2023/03] G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment Yang Liu et al. arXiv. [paper] [code]
  • [2023/02] GPTScore: Evaluate as You Desire Jinlan Fu et al. arXiv. [paper] [code]

Valuable Resources

Related Surveys

LLM Techniques

  • [2023/09] The Rise and Potential of Large Language Model Based Agents: A Survey. Zhiheng Xi et al. arXiv. [paper] [code]
  • [2023/08] Instruction Tuning for Large Language Models: A Survey. Shengyu Zhang et al. arXiv. [paper] [code]
  • [2023/07] A Survey on Evaluation of Large Language Models. Yupeng Chang et al. arXiv. [paper] [code]
  • [2023/07] Aligning Large Language Models with Human: A Survey. Yufei Wang et al. arXiv. [paper] [code]
  • [2023/06] A Survey on Multimodal Large Language Models. Shukang Yin et al. arXiv. [paper] [code]
  • [2023/04] Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. Jingfeng Yang et al. arXiv. [paper]
  • [2023/03] A survey of large language models. Wayne Xin Zhao et al. arXiv. [paper] [code]
  • [2023/03] Language Model Behavior: A Comprehensive Survey. Tyler A. Chang, Benjamin K. Bergen. arXiv. [paper]
  • [2023/02] Augmented Language Models: a Survey. Grégoire Mialon et al. arXiv. [paper]
  • [2022/12] Towards Reasoning in Large Language Models: A Survey. Jie Huang, Kevin Chen-Chuan Chang. ACL'23 Findings [paper]

LLMs in Medicine

  • [2023/11] A Survey of Large Language Models in Medicine: Progress, Application, and Challenge. Hongjian Zhou et al. arXiv. [paper]
  • [2023/10] A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics. Kai He et al. arXiv. [paper]
  • [2023/09] Large language models in medicine: the potentials and pitfalls. Jesutofunmi A. Omiye et al. arXiv. [paper]
  • [2023/09] Artificial General Intelligence for Radiation Oncology. Chenbin Liu et al. arXiv. [paper]
  • [2023/07] Large language models in medicine. Arun James Thirunavukarasu et al. Nature Medicine. [paper]
  • [2023/05] The current and future state of AI interpretation of medical images. Pranav Rajpurkar and Matthew P. Lungren. New England Journal of Medicine. [paper]
  • [2023/04] Utility of ChatGPT in Clinical Practice. Jialin Liu et al. Journal of Medical Internet Research. [paper]
  • [2023/04] Foundation models for generalist medical artificial intelligence. Michael Moor et al. Nature. [paper]
  • [2023/03] Large AI Models in Health Informatics: Applications, Challenges, and the Future. Jianing Qiu et al. IEEE Journal of Biomedical and Health Informatics. [paper]
  • [2023/03] ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Malik Sallam. Healthcare. [paper]
  • [2023/03] ChatGPT in Healthcare: A Taxonomy and Systematic Review. Jianning Li et al. medRxiv. [paper]

Repositories

Project Maintainers & Contributors

Citing

If you find this repository useful in your research, please consider citing it.

@article{yuan2023large,
  title={Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review},
  author={Yuan, Mingze and Bao, Peng and Yuan, Jiajia and Shen, Yunhao and Chen, Zifan and Xie, Yi and Zhao, Jie and Chen, Yang and Zhang, Li and Shen, Lin and others},
  journal={arXiv preprint arXiv:2311.01918},
  year={2023}
}

Licenses

MIT license This project is licensed under the terms of the MIT License.

Acknowledgement

We have structured our repository by drawing inspiration from the substantial work of repositories such as LLM-Agent-Paper-List, CareGPT, and insights from RadLLM. We extend our sincere gratitude to their contributions.