This repository is for collecting and updating existing and upcoming LLM tools and papers focusing on healthcare and medicine domain.
This table is updated by 12 October 2023 including GatorTron, Bio-BERT, PubMedBert, BioMegatron, ClinicalBERT, Med-PaLM 1 & 2.
Model | Paper | Code | Complexity | Data | Tasks |
---|---|---|---|---|---|
GatorTron | Link | NVIDIA Hugging Face |
Base: 345M Medium: 3.90B Large: 8.90B |
1. n2c2 NLP datasets 2. MedNLI dataset 3. emrQA dataset 4. MIMIC III dataset 5. PubMed dataset 6. Wikipedia dataset 7. UF clinical notes (Close) |
- Concept extraction - Relation extraction - Semantic textual similarity - Natural language inference (NLI) - Question answering |
Bio-BERT | Link | Github | BERT(Wiki+Books): 1M BioBERT(+PubMed): 1M BioBERT(+PMC): 270K BioBERT(+PubMed,PMC): 470K |
1. English Wikipedia 2. BooksCorpus 3. PubMed Abstracts 4. PMC Full-text articles |
- Name Entity Recognition (NER) - Relation Extraction - Question answering |
PubMedBert | Link | Hugging Face | Built-on BERT: 1M | 1. PubMed Abstracts 2. PubMed Full-text |
- NER - Information extraction - Relation extraction - Semantic similarity - Document classification - Question answering |
BioMegatron | Link | Close | Built-on Metagron: 8.3B BioMegatron S: 345M BioMegatron M: 800M BioMegatron L: 1.2 B |
Megatron: 1. Wikipedia 2. CC-Stories 3. RealNews 4. OpenWebtext BioMegatron: 5. PubMed abstract (4.5B) 6. PMC full-text (1.6B) |
- NER - Relation Extraction - Question answering |
ClinicalBERT | Link | Github Hugging Face |
Built-on BERT: 1M Built-on BioBERT: 1M |
MIMIC all Clinical notes MIMIC Discharge Summary |
- NER - Concept extraction - NLI |
Med-PaLM1 | Link | Close | Built-on PaLM: 540B | MultiMedQA: (medical exams & research datasets) 1. MedQA 2. MedMCQA 3. PubMedQA 4. LiveQA 5. MedicationQA 6. MMLU clinical topics HealthSearchQA (curated searched health queries) 7.HealthSearchQA |
Question answering |
Med-PaLM2 | Link | Close | Based on PaLM 2 | 1. MedQA 2. MedMCQA 3. PubMedQA 4. MMLU clinical topics 5. HealthSearchQA 6. LiveQA 7. MedicationQA |
Question answering |
Platform | Paper | Code | Tasks | Metrics | Datasets |
---|---|---|---|---|---|
HELM | Link | Github | - Question answering - Information retrieval - Summarization - Sentiment analysis - Reasoning ... other 12 tasks |
Accuracy, Calibration, Robustness, Fairness, Bias, Toxicity, Efficiency, General Info, Summarization, Disinformation, Copyright, Classification, AAPS Metrics,>and BBQ Metrics |
Download Page |
BLURB | Link | 404 | - NER - Question answering - Information retrieval - Relation extraction - Semantic similarity - Classification |
Accuracy (F1, correlation) | Download Page |
GLUE | Link | Github | - Question answering - Semantic similarity - NLI - Sentiment analysis - Coreference resolution |
Accuracy (F1, correlation) | Download Page |
SUPERGLUE | Link | Github | - Question answering - Reasoning - Classification - Text Entailment - Coreference resolution |
Accuracy (F1, correlation) | Download Page |
There is a comprehensive review paper on Knowledge-enhanced LLM: A Survey of Knowledge Enhanced Pre-Trained Language Models A comparable work of Knowledge-enhanced medical LLM: SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with Structured Semantics for Medical Text Mining