Pathology Feature Extractors and Foundation Models

We are witnessing the emergence of many new feature extractors trained using self-supervised learning on large pathology datasets. This repository aims to provide a comprehensive list of these models, alongside key information about them.

I aim to update this list as new models are released, but please submit a pull request / issue for any models I have missed!

Patch-level models

Name	Group	Weights	Released	SSL	WSIs	Tiles	Patients	Batch size	Iterations	Architecture	Parameters	Embed dim	Input size	Dataset
CTransPath	Sichuan University / Tencent AI Lab	✅	Dec 2021*	SRCL	32K	16M				Swin-Transformer		768	224	TCGA, PAIP
RetCCL	Sichuan University / Tencent AI Lab	✅	Dec 2021*	CCL	32K	16M				ResNet-50		2048	224	TCGA, PAIP
REMEDIS	Google Research	✅	May 2022*	SimCLR/BiT	29K	50M	11K cases	4096	1.2M	ResNet-50		2048	224	TCGA
HIPT	Mahmood Lab	✅	Jun 2022*	DINOv1	11K	100M		256	400K	ViT-S		384	256	TCGA
Lunit-DINO	Lunit	✅	Dec 2022*	DINOv1	21K					ViT-S		384	224	TCGA
Lunit-{BT,MoCoV2,SwAV}	Lunit	✅	Dec 2022*	{BT,MoCoV2,SwAV}	21K					ResNet-50		2048	224	TCGA
Phikon	Owkin	✅	Jul 2023*	iBOT	6.1K	43M	5.6K	1440	155K	ViT-B	86M	768	224	TCGA
CONCH	Mahmood Lab	✅	Jul 2023*	iBOT & vision-language pretraining	21K	16M		1024	80 epochs	ViT-B	86M	768	224	proprietary
UNI	Mahmood Lab	✅	Aug 2023*	DINOv2	100K	100M				ViT-L		1024	224	proprietary (Mass-100K)
Virchow	Paige / Microsoft	✅	Sep 2023*	DINOv2	1.5M		120K			ViT-H	632M	2560	224	proprietary (from MSKCC)
Campanella et al. (MAE)	Thomas Fuchs Lab	❌	Oct 2023*	MAE	420K	3.3B	77K	1080	1.3K INE	ViT-L	303M		224	proprietary (MSHS)
Campanella et al. (DINO)	Thomas Fuchs Lab	❌	Oct 2023*	DINOv1	420K	3.3B	77K	1440	2.5K INE	ViT-L	303M		224	proprietary (MSHS)
Path Foundation	Google	✅	Oct 2023*	SimCLR, MSN	6K	60M		1024		ViT-S		384	224	TCGA
PathoDuet	Shanghai Jiao Tong University	✅	Dec 2023*	inspired by MoCoV3	11K	13M		2048	100 epochs	ViT-B		4096	224	TCGA
RudolfV	Aignostics	❌	Jan 2024*	DINOv2	100K	750M	36K			ViT-L			224	proprietary (from EU & US), TCGA
kaiko	kaiko.ai	✅	Mar 2024*	DINOv2	29K	260M**		512	200 INE	ViT-L		1024	224	TCGA
PLUTO	PathAI	❌	May 2024*	DINOv2 (+ MAE and Fourier loss)	160K	200M				FlexiViT-S	22M		224	proprietary (PathAI)
BEPH	Shanghai Jiao Tong University	✅	May 2024*	BEiTv2	12K	12M		1024		ViT-B	193M	1024	224	TCGA
Prov-GigaPath	Microsoft / Providence	✅	May 2024*	DINOv2	170K	1.4B	30K	384		ViT		1536	224	proprietary (Providence)
Hibou-B	HistAI	✅	Jun 2024*	DINOv2	1.1M	510M	310K cases	1024	500K	ViT-B	86M	768	224	proprietary
Hibou-L	HistAI	✅	Jun 2024*	DINOv2	1.1M	1.2B	310K cases	1024	1.2M	ViT-L	304M	1024	224	proprietary
H-optimus-0	Bioptimus	✅	Jul 2024*	DINOv2/iBOT	500K (across 4,000 clinics)	>100M	200K			ViT-G with 4 registers	1.1B	1536	224	proprietary
mSTAR	Smart Lab	❌	Jul 2024*	mSTAR (multimodal)	10K		10K			ViT-L			224	TCGA
Virchow 2	Paige / Microsoft	✅	Aug 2024*	DINOv2 (+ ECT and KDE)	3.1M	2B	230K	4096		ViT-H with 4 registers	632M	3584	224	proprietary (from MSKCC and international sites)
Virchow 2G	Paige / Microsoft	❌	Aug 2024*	DINOv2 (+ ECT and KDE)	3.1M	2B	230K	3072		ViT-G with 8 registers	1.9B	3584	224	proprietary (from MSKCC and international sites)
Phikon-v2	Owkin	✅	Sep 2024*	DINOv2	58.4K	456M		4096	250K	ViT-L	307M	1024	224	PANCAN-XL (TCGA, CPTAC, GTEx, proprietary)

Notes:

Models trained on >100K slides may be considered foundation models and are marked in bold
# of WSIs, tiles, and patients are reported to 2 significant figures
INE = ImageNet epochs
Order is chronological
Some of these feature extractors have been evaluated in a benchmarking study for whole slide classification here.
** means inferred from other numbers provided in the paper

Slide-level / patient-level models

This table includes models that produce slide-level or patient-level embeddings without supervision.

Name	Group	Weights	Released	SSL	WSIs	Patients	Batch size	Iterations	Architecture	Parameters	Embed dim	Patch size	Dataset
GigaSSL	CBIO	✅	Dec 2022*	SimCLR	12K			1K epochs	ResNet-18		256	256	TCGA
PRISM	Paige / Microsoft	✅	May 2024*	contrastive (with language)	590K (190K text reports)	190K	64 (x4)	75K (10 epochs)	Perceiver + BioGPT		1280	224	proprietary
Prov-GigaPath	Microsoft / Providence	✅	May 2024*	DINOv2	170K	30K			LongNet	86M	1536	224	proprietary (Providence)
MADELEINE	Mahmood Lab	✅	Aug 2024*	contrastive (InfoNCE & OT)	16K	2K	120	90 epochs	multi-head attention MIL		512	256	ACROBAT, BWH Kidney (proprietary)
CHIEF	Yu Lab	✅	Sep 2024*

georg-wolflein/pathology-foundation-models

Pathology Feature Extractors and Foundation Models

Patch-level models

Slide-level / patient-level models