RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval (Medical Image Analysis)

Please open new threads or address all questions to xiyue.wang.scu@gmail.com

A better and stronger pre-trained model was built for various histopathological image applications. This model outperforms ImageNet pre-trained features by a large margin. We release our best model and invite researchers to test it on your computational pathology tasks.

Hardware

128GB of RAM
32*Nvidia V100 32G GPUs

Preparations

1.Download all TCGA 32000 WSIs.

2.Download all PAIP 2,457 WSIs. So, there will be about 15,000,000 images(~100T). It costs us $400,000 to advance the progress of digital pathology.

Pre-trained models for histopathological image tasks

This pre-train model is here

1.Classification through search

It is the most obvious and direct way to evaluate the distinctive power of the provided features.

		TissueNet
	Acc@1	Acc@3	Acc@5	mMV@5
ImageNet	50.35	77.65	87.68	46.15
CCL (ours)	67.09	87.81	93.4	70.1

		UniToPatho
	Acc@1	Acc@3	Acc@5	mMV@5
ImageNet	58.17	82.89	89.45	59.01
CCL (ours)	66.55	84.32	90.31	68.35

2.Multiple Instance Learning for Whole Slide Image Classification

This task is currently based on ImageNet pretrained features, which can also verify the superiority of our feature extractor.

		TCGA-NSCLC
	Accuracy	AUC
ABMIL	0.7719	0.8656
MIL-RNN	0.8619	0.9107
DSMIL	0.8058	0.8925
TransMIL	0.8835	0.9603
CLAM	0.8422	0.9377
CLAM+CCL (ours)	0.911	0.967

3.Classification based on features using SVM

This task follows KimiaNet

	Colorectal cancer dataset
	Accuracy
Combined features	87.40
Fine-tuned VGG-19	86.19
Ensemble of CNNs	92.83
KamiaNet	96.80
CCL (ours)	98.40

If you want to compute the features.

python get_feature.py

It is recommended to first try to extract features at 1.0mpp, and then try other magnifications

If you want to fine-tune model.

python resnet_lincls.py

Whole-Slide Images retrieval

Please refer to FISH, when clustering and searching, use our features, then remove the Tree and search directly

License

RetCCL is released under the GPLv3 License and is available for non-commercial academic purposes.

Citation

Please use below to cite this paper if you find our work useful in your research.

@article{WANG2023102645,
title = {RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval},
author = {Xiyue Wang and Yuexi Du and Sen Yang and Jun Zhang and Minghui Wang and Jing Zhang and Wei Yang and Junzhou Huang and Xiao Han},
journal = {Medical Image Analysis},
volume = {83},
pages = {102645},
year = {2023},
issn = {1361-8415}
}

mchen-caris/RetCCL