A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics

🔔News
🔔Table of Contents
🔔Important Tables and Figures
🔔Citation

News

2023-05-31 update.new paper "Polaris: A Safety-focused LLM Constellation Architecture for Healthcare"
2023-05-31 update.new paper "Medical mT5: an open-source multilingual text-to-text LLM for the medical domain"
2023-05-31 update.new paper "Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People"
2023-05-31 update.new paper "LLM-CXR: INSTRUCTION-FINETUNED LLM FOR CXR IMAGE UNDERSTANDING AND GENERATION"
2023-05-31 update.new paper "Me LLaMA: Foundation large language models for medical applications"
2023-05-31 update.new paper "BioMistral: A collection of open-source pretrained large language models for medical domains"
2023-05-31 update.new paper "OncoGPT: A medical conversational model tailored with oncology domain expertise on a large language model Meta-AI (LLaMA)"
2023-03-17 update.new paper "Health-LLM: Personalized Retrieval-Augmented Disease Prediction System"
2023-03-17 update.new paper "HealAI: A Healthcare LLM for Effective Medical Documentation"
2023-03-17 update.new paper "BiMediX: Bilingual Medical Mixture of Experts LLM"
2023-03-17 update.new paper "JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability"
2023-03-17 update.new paper "MedChatZH: A tuning LLM for traditional Chinese medicine consultation"
2023-10-18 added new paper "Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue".
2023-10-18 added new paper "Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model".
2023-10-9 We release the version 1 of the survey (https://arxiv.org/abs/2310.05694).

Introduction
What LLMs Can Do for Healthcare? From Fundamental Tasks to Advanced Applications
- NER and RE for Healthcare Alpacare
- Text Classification for Healthcare
- Semantic Textual Similarity for Healthcare
- Question Answering for Healthcare
- Dialogue System for Healthcare
- Generation of Medical Reports from Images
- Summary
From LMs to LLMs for Healthcare
- LMs for Healthcare
- LLMs for Healthcare
Train and Use LLM for Healthcare
- Pre-training Methods
- Masked Language Modeling
- Next Word Prediction
- Sequence-to-sequence MLM
- Replaced Token Detection
- Sentence Boundary Detection
- Next Sentence Prediction
- Sentence Order Prediction
Post-training Methods
- From predicting tokens to follow instructions: Instruction Fine-Tuning and Supervised Fine-tuning
- Reinforced Learning from Human Feedback
- From Human Feedback to AI Feedback
- Summary
Usage
- From Fine-tuning to In-context Learning
- From System 1 Deep Learning To System 2 Deep Learning: Chain-of-Thought
- AI Agents
- Summary
Parameters-, Memory-, and Compute-efficient Methods
- Parameters-efficient Methods
- Compute-efficient and Memory-efficient Methods
Useful Resources
- OpenBMB
- DeepSpeed Chat
- Training Data
- Summary
Evaluation Method
- General NLP tasks Evaluation
- Healthcare Evaluation
- Evaluation of Robustness, Bias, and Ethics
- Future Directions for Health Evaluation
- Summary
Improving Fairness, Accountability, Transparency, and Ethics
- Fairness
- Accountability
- Transparency
- Ethics
Future work and Conclusion
- Future Work
- Medical knowledge enhancement
- Integration with Healthcare process
- Effective Interaction with Patients and Doctors
- Hallucinations, Misunderstandings and Prompt Brittleness
Conclusion

Important Tables and Figures

Fig. 2. The organizational framework for the content. Section III, Section IV, Section V are technology details, while Section II, Section VI and Section VI are more valued for Healthcare professionals

LLM Information

Model Name	Base	Para. (B)	Features	Date	Link
GatorTron	Transformer	0.345, 3.9, 8.9	Training from scratch	06/2022	https://github.com/uf-hobi-informatics-lab/GatorTron
Codex-Med	GPT-3.5	175	CoT, Zero-shot	07/2022	https://github.com/vlievin/medical-reasoning
Galactica	Transformer	1.3, 6.4, 30, 120	Reasoning, Multidisciplinary	11/2022	https://galactica.org
Med-PaLM	Flan-PaLM/PaLM	540	CoT, Self-consistency	12/2022	-
GPT-4-Med	GPT-4	-	no specialized prompt crafting	03/2023	-
DeID-GPT	GPT-4	-	De-identifying	03/2023	https://github.com/yhydhx/ChatGPT-API
ChatDoctor	LLaMA	7	Retrieve online, external knowledge	03/2023	https://github.com/Kent0n-Li/ChatDoctor
DoctorGLM	ChatGLM	6	Extra prompt designer	04/2023	https://github.com/xionghonglin/DoctorGLM
MedAlpaca	LLaMA	7, 13	Adapt to Medicine	04/2023	https://github.com/kbressem/medAlpaca
BenTsao	LLaMA	7	Knowledge graph	04/2023	https://github.com/SCIR-HI/ Huatuo-Llama-Med-Chinese
PMC-LLaMA	LLaMA	7	Adapt to Medicine	04/2023	https://github.com/chaoyi-wu/PMC-LLaMA
Visual Med-Alpaca	LLaMA	7	multimodal generative model, Self-Instruct	04/2023	https://github.com/cambridgeltl/visual-med-alpaca
BianQue~	ChatGLM	6	Chain of Questioning	04/2023	https://github.com/scutcyr/BianQue
Med-PaLM 2	PaLM 2	340	Ensemble refinement, CoT, Self-consistency	05/2023	-
GatorTronGPT	GPT-3	5, 20	Training from scratch for medicine	05/2023	https://github.com/uf-hobi-informatics-lab/GatorTronGPT
HuatuoGPT	Bloomz	7	Reinforced learning from AI feedback	05/2023	https://github.com/FreedomIntelligence/HuatuoGPT
ClinicalGPT	BLOOM	7	multi-round dialogue consultations	06/2023	-
MedAGI	MiniGPT-4	-	multimodal, AGI	06/2023	https://github.com/JoshuaChou2018/MedAGI
LLaVA-Med	LLaVA	13	multimodal, self-instruct, curriculum learning	06/2023	https://github.com/microsoft/LLaVA-Med
OphGLM	ChatGLM	6	multimodal, Ophthalmology LLM	06/2023	https://github.com/ML-AILab/OphGLM
SoulChat	ChatGLM	6	Mental Healthcare	06/2023	https://github.com/scutcyr/SoulChat
Med-Flamingo	Flamingo	80B	multimodal, Few-Shot generative medical VQA	07/2023	https://github.com/snap-stanford/med-flamingo

PLM Information

TABLE I BRIEF SUMMARIZATION OF EXISTING PLMS FOR HEALTHCARE.

Model Name	Base	Para. (B)	Features	Date	Link
BioBERT	BERT	0.34	Biomedical Adaption	05/2019	https://github.com/naver/biobert-pretrained
BlueBERT	BERT	0.34	Biomedical Benchmark	06/2019	https://github.com/ncbi-nlp/BLUE\_Benchmark
MIMIC-BERT	BERT	0.34	Clinical Concept Extraction	08/2019	-
BioFLAIR~	BERT	0.34	Less Computationally Intensive	08/2019	https://github.com/zalandoresearch/flair
Bio-ELECTRA-small	ELECTRA	0.03	Training From Scratch	03/2020	-
AlphaBERT	BERT	0.11	Character-level	04/2020	https://github.com/wicebing/AlphaBERT.git
Spanish-bert	BERT	-	Spanish	04/2020	-
GreenCovidSQuADBERT	BERT	0.34	CPU-only, CORD-19	04/2020	https://github.com/npoe/covid-qa
BEHRT	Transformer	-	Training From Scratch	04/2020	https://github.com/deepmedicine/BEHRT
BioMed-RoBERTa	RoBERTa	0.11	Biomedical Adaption	05/2020	https://github.com/allenai/dont-stop-pretraining
RadBERT~	BERT	-	RadCore Radiology Reports	05/2020	-
CT-BERT~	BERT	0.34	COVID-19	05/2020	https://github.com/digitalepidemiologylab/covid-twitter-bert
French-BERT	BERT	0.11	French Language Models	06/2020	-
FS-/RAD-/GER-BERT	BERT	0.11	Chest Radiograph Reports	07/2020	https://github.com/fast-raidiology/bertfor-radiology
Japanese-BERT	BERT	10.11	Japanese Clinical Narrative	07/2020	ai-health.m.u-tokyo.ac.jp/home/research/uth-bert
MC-BERT	BERT	0.11	Chinese Biomedical Benchmark	08/2020	https://github.com/alibabaresearch/ChineseBLUE
BioALBERT-ner	ALBERT	0.18	Biomedical NER	09/2020	https://github.com/usmaann/BioALBERT
BioMegatron	Megatron	1.2	Training From Scratch	10/2020	https://github.com/NVIDIA/NeMo
CharacterBERT	BERT	0.11	Character-CNN module	10/2020	https://github.com/helboukkouri/character-bert
ClinicalBert	BERT	0.11	For Predicting Hospital Readmission	11/2020	https://github.com/kexinhuang12345/clinicalBERT
Clinical XLNet	XLNet	0.11	Temporal Information	11/2020	https://github.com/lindvalllab/clinicalXLNet
Bio-LM	RoBERTa	0.34	Biomedical Adaption	11/2020	https://github.com/facebookresearch/bio-lm
BioBERTpt	BERT	0.11	Portuguese Clinical	11/2020	https://github.com/HAILab-PUCPR/BioBERTpt
RoBERTa-MIMIC	RoBERTa	0.11	Clinical Concept Extraction	12/2020	https://github.com/uf-hobi-informatics-lab/ClinicalTransformerNER
Clinical KB-ALBERT	ALBERT	0.03	Introducing Medical KB	12/2020	https://github.com/noc-lab/clinical-kb-bert
CHMBERT	BERT	0.11	Chinese Medical, Cloud Computing	01/2021	-
PubMedBERT	BERT	0.11	Training From Scratch	01/2021	https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext
ouBioBERT	BERT	0.11	Up-sampling, Amplified Vocabulary	02/2021	https://github.com/sy-wada/blue\_benchmark\_with\_transformers
BERT-EHR	BERT	-	Depression,Chronic Disease Prediction	03/2021	https://github.com/lanyexiaosa/brltm
AraBERT	BERT	0.11	Arabic Language	03/2021	https://github.com/aub-mind/araBERT
ABioNER	BERT	0.11	Arabic NER	03/2021	-
ELECTRAMed	ELECTRA	0.11	Biomedical Adaption	04/2021	https://github.com/gmpoli/electramed
KeBioLM	PubMedBERT	0.11	Introducing Medical KB	04/2021	https://github.com/GanjinZero/KeBioLM
SINA-BERT	BERT	0.11	Persian Language	04/2021	-
Med-BERT	BERT	0.11	Stay Length Prediction	05/2021	https://github.com/ZhiGroup/MedBERT
Galén	RoBERTa	0.11	Spanish Language	05/2021	https://github.com/guilopgar/ClinicalCodingTransformerES
SCIFIVE~	T5	0.77	Biomedical Text Generation	05/2021	https://github.com/justinphan3110/SciFive
BioELECTRA	ELECTRA	0.34	Training From Scratch	06/2021	https://github.com/kamalkraj/BioELECTRA
UmlsBERT	BERT	0.11	Introducing Medical KB	06/2021	https://github.com/gmichalo/UmlsBERT
MedGPT	GPT-2	1.5	Temporal Modelling	07/2021	-
MentalBERT	BERT	0.11	Mental Healthcare	10/2021	https://huggingface.co/mental
CODER	mBERT	0.34	Cross-lingual, Introducing Medical KB	02/2022	https://github.com/GanjinZero/CODER
BioLinkBERT~	BERT	0.34	PubMed with Citation Links	03/2022	https://github.com/michiyasunaga/LinkBERT
BioALBERT	ALBERT	0.03	Biomedical Adaption	04/2022	https://github.com/usmaann/BioALBERT
BioBART~	BART	0.4	Biomedical NLG	04/2022	https://github.com/GanjinZero/BioBART
SAPBERT	BERT	0.11	Self-Alignment Pretraining	10/2022	https://github.com/cambridgeltl/sapbert
VPP	BART	0.14	Soft prompt, Biomedical NER	03/2023	https://github.com/KaiHe-better/VPP
KAD	BERT	-	Multimodal, Chest Radiology Images	03/2023	https://github.com/xiaoman-zhang/KAD

TABLE II SUMMARIZATION OF TRAINING DATA AND EVALUATION TASKS FOR EXISTING PLMS FOR HEALTHCARE.

Model Name	Method	Training Data	Eval task
BioBERT	FT	PubMed, PMC	Biomedical NER, RE, QA
BlueBert	FT	PubMed, MIMIC-III	BLUE
MIMIC-BERT	FT	MIMIC-III	Biomedical NER
BioFLAIR~	FT	PubMed	Bio NER
Bio-ELECTRA-small	PT	PubMed	Biomedical NER
AlphaBERT	FT	Discharge diagnoses	Extractive Summarization Task
Spanish-bert	FT	Spanish	Spanish Clinical Case Corpus
GreenCovidSQuADBERT	FT	CORD19, PubMed, PMC	NER, QA
BEHRT	PT	CPRD, HES	Disease Prediction
BioMed-RoBERTa	FT	BIOMED	CHEMPROT, RCT
RadBERT~	FT	Radiology Report Corpus	Report Coding, Summarization
CT-BERT~	FT	Tweet	COVID-19 Text Classification
French-BERT	FT	French clinical documents	DEFT challenge
FS-/RAD-/GER-BERT	FT,PT	Unstructured radiology reports	Chest Radiograph Reports Classification
Japanese-BERT	FT	Japanese EHR	Symptoms Classification
MC-BERT	FT	Chinese EHR	Chinese Biomedical Evaluation benchmark
BioALBERT-ner	FT	PubMed, PMC	Biomedical NER
BioMegatron	PT	PubMed	biomedical NER, RE, QA
CharacterBERT	Bert	OpenWebText, MIMIC-III, PMC	Medical NER, NLI, RE, SS
ClinicalBert	FT	MIMIC-III	Hospital Readmission Prediction
Clinical XLNet	FT	MIMIC-III	PMV, Mortality
Bio-LM	FT	PubMed, PMC, MIMIC-III	18 Biomedical NLP Tasks
BioBERTpt	FT	Private clinical notes, WMT16	SemClinBr
RoBERTa-MIMIC	FT	i2b2 2010, 2012, n2c2 2018	i2b2 2010, 2012, N2C2 2018
Clinical KB-ALBERT	FT	MIMIC-III, UMLS	MedNLI, i2b2 2010, 2012
CHMBERT	FT	Medical text data	Disease Prediction
PubMedBERT	PT	PubMed	BLURB
ouBioBERT	FT	PubMed, Wikipedia	BLUE
BERT-EHR	FT	General EHR	Myocardial Infarction, Breast Cancer, Liver Cirrhosis
AraBERT	PT	Arabic Wikipedia, OSIAN	Arabic SA, NER, QA
ABioNER	FT	Arabic scientific literature	Arabic NER
ELECTRAMed	FT	PubMed	Biomedical NER, RE, and QA
KeBioLM	FT	PubMed	BLURB
SINA-BERT	FT	Online Persian source	Persian QA, SA
Med-BERT	FT	General EHR	Disease prediction
Galén	FT	Private clinical cases	CodiEsp-D, CodiEsp-P, Cantemist-Coding tasks
SCIFIVE~	T5	PubMed, PMC	Biomedical NER, RE, NIL, QA
BioELECTRA	PT	PubMed, PMC	BLURB, BLUE
UmlsBERT	FT	MIMIC-III	MedNLI, i2b2 2006,2010, 2012, 2014
MedGPT	FT	MIMIC-III, private EHRs	Disorder Prediction
MentalBERT	FT	Reddit	Depression Stress, Suicide Detection,
CODER	FT	UMLS	MCSM, Medical RE
BioLinkBERT~	FT	PubMed	BLURB, USMLE
BioALBERT	FT	PubMed, PMC, MIMIC-III	6 BioNLP Tasks
BioBART~	FT	PubMed	Biomedical EL, NER, QA, Dialogue, Summarization
SAPBERT	FT	UMLS	MEL
VPP	FT	PubMed	Biomedical NER
KAD	FT	MIMIC-CXR	PadChest, ChestXray14, CheXpert and ChestX-Det10

Availble Training Data

Data	Type	size	Link
MIMIC-III	EHR	58,976 hospital admissions for 38,597 patients	https://mimic.mit.edu/docs/iii/
MIMIC-IV	EHR	covering a decade of admissions between 2008 and 2019	https://mimic.mit.edu/docs/iv/
CPRD	EHR	over 2,000 primary care practices and include 60 million patients	https://cprd.com/data
PubMed	Scientific Literature	35M citations and abstracts of biomedical literature	https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/
PMC	Scientific Literature	8 million full-text article records	https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_bulk
RCT	Scientific Literature	4,528 abstract	https://github.com/bwallace/RCT-summarization-data
MS$\hat{~}$2	Scientific Literature	470,402 abstract	https://github.com/allenai/ms2/
CDSR	Scientific Literature	7,805 abstract	https://github.com/qiuweipku/Plain\_language\_summarization
SumPubMed	Scientific Literature	33,772 abstract	https://github.com/vgupta123/sumpubmed
The Pile	Scientific Literature	825 GB English text	https://pile.eleuther.ai/
S2ORC	Scientific Literature	63,709 abstract	https://github.com/jbshp/GenCompareSum
CORD-19	Scientific Literature	1M papers	https://github.com/allenai/cord19
MeQSum	Medical Question Summarization	1000 instances	https://github.com/abachaa/MeQSum
CHQ-Sum	Medical Question Summarization	1507 instances	https://github.com/shwetanlp/Yahoo-CHQ-Summ
UMLS	Knowledge Base	2M entities for 900K concepts	https://www.nlm.nih.gov/research/umls/index.html
COMETA	Web Data (social media)	800K Reddit posts	https://github.com/cambridgeltl/cometa
MedDialog	Dialogue	3.66 million conversations	https://github.com/UCSD-AI4H/COVID-Dialogue
CovidDialog	Dialogue	603 consultations	https://github.com/UCSD-AI4H/COVID-Dialogue
Medical Flashcards	Dialogue	33955 instances	https://github.com/kbressem/medalpaca
Wikidoc	Dialogue	67704 instances	https://huggingface.co/datasets/medalpaca/medical\_meadow\_wikidoc
Wikidoc Patient Information	Dialogue	5942 instances	https://huggingface.co/datasets/medalpaca/medical\_meadow\_wikidoc\_patient\_information
MEDIQA	Dialogue	2208 instances	https://huggingface.co/datasets/medalpaca/medical\_meadow\_wikidoc\_patient\_information
CORD-19	Dialogue	1056660 instances	https://huggingface.co/datasets/medalpaca/medical\_meadow\_cord19
MMMLU	Dialogue	3787 instances	https://huggingface.co/datasets/medalpaca/medical\_meadow\_mmmlu
Pubmed Causal	Dialogue	2446 instances	https://huggingface.co/datasets/medalpaca/medical\_meadow\_pubmed\_causal
ChatDoctor	Dialogue	215000 instances	https://github.com/Kent0n-Li/ChatDoctor
Alpaca-EN-AN	English Instructions	52K instructions	https://github.com/tatsu-lab/stanford\_alpaca/blob/main/alpaca\_data.json
Alpaca-CH-AN	Chinese Instructions	52K instructions	https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/tree/main/data
ShareGPT	Conversations	61653 long conversations	https://huggingface.co/datasets/philschmid/sharegpt-raw
WebText	Web Data	40 GB of text	https://commoncrawl.org/the-data/get-started/
OpenWebText	Web Data	38 GB of text	https://skylion007.github.io/OpenWebTextCorpus/
Colossal Clean Crawled Corpus	Web Data	806 GB of text	https://www.tensorflow.org/datasets/catalog/c4
OpenI	EHR, Multimodel	3.7 million images from about 1.2 million papers	https://openi.nlm.nih.gov/faq\#collection
U-Xray	Multimodel	3,955 reports and 7,470 images	https://openi.nlm.nih.gov/
ROCO	Multimodel	81,000 radiology images and corresponding captions	https://github.com/razorx89/roco-dataset
MedICaT	Multimodel	17,000 images includes captions	https://github.com/allenai/medicat
PMC-OA	Multimodel	1.6M image-caption pairs	https://huggingface.co/datasets/axiong/pmc\_oa\_beta
CheXpert	Multimodel	224,316 chest radiographs with associated reports	https://aimi.stanford.edu/chexpert-chest-x-rays
PadChest	Multimodel	160,000 images with related text	http://bimcv.cipf.es/bimcv-projects/padchest/
MIMIC-CXR	Multimodel	227,835 imaging studies for 64,588 patients	https://mimic.mit.edu/docs/iv/modules/cxr/
PMC-15M	Multimodel	15 million Figure-caption
pairs	https://arxiv.org/abs/2303.00915
OpenPath	Multimodel	208,414 pathology images related descriptions	https://laion.ai/blog/laion-5b/

The Statistics of Computation Cost

TABLE VIII THE STATISTICS OF COMPUTATION COST FOR EXISTING HEALTHCARE LLM.

Model Name	Total data size	epoch	Batch size	GPU type	GPU number	GPU time
Visual Med-Alpaca	54k data points	3	128	A100-80G	4	2.51 hours
GatorTron	\textgreater 90 billion words	10	-	A100	992	6 days
Galactica	-	-	-	A100-80G	128	-
ChatDoctor	100k conversations	3	192	A100	6	3 hours
DoctorGLM	3.5G	1	4	A100-80G	1	8 hours
PMC-LLaMA	75B tokens	5	128	A100	8	7 days
Visual Med-Alpaca	44.8MB* (without images)	-	128	A100-80G	4	2.51 hours
BianQue 1.0	9 million samples	1	-	RTX 4090	8	16 days
GatorTronGPT	277B tokens		1,120/560	A100-80G	560	26 days
HuatuoGPT	226,042 instances	3	128	A100	8	-
LLaVA-Med	15 million figure-caption pairs	-	-	A100	8	15 hours
Med-Flamingo	1.3M image-caption pairs	-	400	A100-80G	8	6.75 days

TABLE IX ESTIMATED FLOPS AND TRAINING TOKENS FOR DIFFERENT MODEL SIZES.

Parameters	FLOPs	FLOPs (in Gopher unit)	Tokens
400 Million	1.92e+19	1/29, 968	8.0 Billion
1 Billion	1.21e+20	1/4, 761	20.2 Billion
10 Billion	1.23e+22	1/46	205.1 Billion
67 Billion	5.76e+23	1	1.5 Trillion
175 Billion	3.85e+24	6.7	3.7 Trillion
280 Billion	9.90e+24	17.2	5.9 Trillion
520 Billion	3.43e+25	59.5	11.0 Trillion
1 Trillion	1.27e+26	221.3	21.2 Trillion
10 Trillion	1.30e+28	22515.9	216.2 Trillion

Citation

@misc{he2023survey,
      title={A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics}, 
      author={Kai He and Rui Mao and Qika Lin and Yucheng Ruan and Xiang Lan and Mengling Feng and Erik Cambria},
      year={2023},
      eprint={2310.05694},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

KaiHe-better/LLM-for-Healthcare