MI-Zero

Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images, CVPR 2023. [HTML] [ArXiv] [Video] [Cite]

Ming Y. Lu*, Bowen Chen*, Andrew Zhang, Drew F. K. Williamson, Richard J. Chen, Tong Ding, Long Phi Le, Yung-Sung Chuang, Faisal Mahmood

@InProceedings{Lu_2023_CVPR,
    author    = {Lu, Ming Y. and Chen, Bowen and Zhang, Andrew and Williamson, Drew F. K. and Chen, Richard J. and Ding, Tong and Le, Long Phi and Chuang, Yung-Sung and Mahmood, Faisal},
    title     = {Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {19764-19775}
}

© This code and corresponding models are made available for non-commercial academic purposes and is licenced under the Creative Commons Attribution Non Commercial No Derivatives 4.0 International license. Commercial entities may contact us or the Mass General Brigham Innovations office.

Installation

To install dependencies, clone the repository and

conda env create -f env.yml
conda activate mizero
pip install ./assets/timm_ctp.tar --no-deps

How to use

Data

MI-Zero can be applied to any dataset of whole slide images. In our paper, we reported results primarily from an in-house dataset. For reproducibility, we also tested on some subsets of WSIs from The Cancer Genome Atlas. These results are in the Supplementary Material of the paper. Below we provide a quick example using a subset of cases for TCGA RCC subtyping.

Prepare patches

To extract patches from WSIs, we used CLAM, but other packages can be used as long as the patching outputs are stored in the .h5 format, where each .h5 file refers to a WSI and the 'coords' key points to a numpy array of N x 2 coordinates for N patches from that WSI.

Prepare patch embeddings for MI-Zero

The script assumes wsis are .svs files and extracts embeddings using 20x patches:

python extract_embeddings.py --csv_path ./data_csvs/tcga_rcc_zeroshot_example.csv --h5_source <path_to_h5_files> --wsi_source <path_to_wsi_files> --save_dir <where_to_save_embeddings> --ckpt_path <path_to_checkpoint> --device cuda:0

Inference on whole slide images

To run MI-Zero after extracting patch embeddings, use slidelevel_zeroshot_multiprompt.py. An example command is provided for TCGA RCC subtyping:

python slidelevel_zeroshot_multiprompt.py --task RCC_subtyping --embeddings_dir <path_to_rcc_embeddings> --dataset_split ./data_csvs/tcga_rcc_zeroshot_example.csv --topj 1 5 50 --prompt_file ./prompts/rcc_prompts.json --model_checkpoint ./logs/ctranspath_448_bioclinicalbert/checkpoints/epoch_50.pt

Pretrained models

Since some of our models were pretrained on proprietary in-house data, we are only able to release encoder weights that were pretrained entirely on publicly available data. The checkpoints can be found here.

Specificaly, we release two models trained with bioclinicalbert and pubmedbert as the text encoder respectively.

bioclinicalbert: ctranspath_448_bioclinicalbert/checkpoints/epoch_50.pt
pubmedbert: ctranspath_448_pubmedbert/checkpoints/epoch_50.pt

Once the weights are downloaded, they should be placed into the ./src/logs/ directory, such that their relative paths to MI-Zero are:

./src/ctranspath_448_bioclinicalbert/checkpoints/epoch_50.pt
./src/ctranspath_448_pubmedbert/checkpoints/epoch_50.pt

Contact

For any questions, please open new issues our reach out to us over email at mingylu@mit.edu or bchen18@bwh.harvard.edu.

Acknowledgements

The repo was partly inspired by open source repositories such as openclip, timm and huggingface transformers. We thank the authors and developers for their contribution.

License

This work is under the Creative Commons Attribution Non Commercial No Derivatives 4.0 International license.

Funding

This work was funded by NIH NIGMS R35GM138216.

Citation

If you find our work useful, please cite our paper: Lu, M.Y., Chen, B., Zhang, A., Williamson, D.F., Chen, R.J., Ding, T., Le, L.P., Chuang, Y.S. and Mahmood, F., 2023. Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 19764-19775).

@InProceedings{Lu_2023_CVPR,
    author    = {Lu, Ming Y. and Chen, Bowen and Zhang, Andrew and Williamson, Drew F. K. and Chen, Richard J. and Ding, Tong and Le, Long Phi and Chuang, Yung-Sung and Mahmood, Faisal},
    title     = {Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {19764-19775}
}

mahmoodlab/MI-Zero