Self-supervised Federated Learning (SSL-FL)
Label-Efficient Self-Supervised Federated Learning for Tackling Data Heterogeneity in Medical Imaging
*TL;DR: Pytorch implementation of the self-supervised federated learning framework proposed in our paper for simulating self-supervised classification on multi-institutional medical imaging data using federated learning.
- Our framework employs masked image encoding as self-supervised task to learn efficient representations from images.
- Extensive experiments are performed on diverse medical datasets including retinal images, dermatology images and chest X-rays.
- In particular, we implement BEiT and MAE as the self-supervision learning module.
Reference
If you find our work helpful in your research or if you use any source codes or datasets, please cite our paper. The bibtex is listed below:
@article{yan2023label,
title={Label-efficient self-supervised federated learning for tackling data heterogeneity in medical imaging},
author={Yan, Rui and Qu, Liangqiong and Wei, Qingyue and Huang, Shih-Cheng and Shen, Liyue and Rubin, Daniel and Xing, Lei and Zhou, Yuyin},
journal={IEEE Transactions on Medical Imaging},
year={2023},
publisher={IEEE}
}
Pre-requisites:
Set Up Environment
conda env create -f environment.yml
- NVIDIA GPU (Tested on Nvidia Tesla V100 32G x 4, and Nvidia GeForce RTX 2080 Ti x 8) on local workstations
- Python (3.8.12), torch (1.7.1), timm (0.3.2), numpy (1.21.2), pandas (1.4.2), scikit-learn (1.0.2), scipy (1.7.1), seaborn (0.11.2)
Data Preparation
Please refer to SSL-FL/data for information on the directory structures of data folders, download links to datasets, and instructions on how to train on custom datasets.
Self-supervised Federated Learning for Medical Image Classification
In this paper, we selected ViT-B/16 as the backbone for all methods. The specifications for BEiT-B are as follows: #layer=12; hidden=768; FFN factor=4x; #head=12; patch=16x16 (#parameters: 86M).
Please refer to SSL-FL/data for access to the links to pre-trained checkpoints that were used to generate the results.
Self-supervised Federated Pre-training and fine-tuning
Sample scripts for running Fed-BEiT and Fed-MAE pre-training and finetuning on the Retina dataset can be found in the following directories: SSL-FL/code/fed_beit/script/retina for Fed-BEiT and SSL-FL/code/fed_mae/script/retina for Fed-MAE.
To run Fed-BEiT, please download Dall-e tokenizers and save encoder.pkl and decoder.pkl to SSL-FL/data/tokenizer_weight:
wget https://cdn.openai.com/dall-e/encoder.pkl
wget https://cdn.openai.com/dall-e/decoder.pkl
Acknowledgements
- This repository is based on BEiT and MAE.
- The main FL setup is based on prior work "Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning"