CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities
This repo contains a PyTorch implementation of the paper CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities (MICCAI 2024).
Note: This repository will be updated in the next few days for improved readability, easier environment setup, and datasets management Please stay tuned!
Illustration of CAR-MFL: (a) Multimodal client with access to multimodal data. (b) Multimodal federated system with missing modality. (c) Image client with only image samples; missing text modality is retrieved via our Cross-Modal Augmentation module. (d) Cross-Modal augmentation procedure for a query image (yellow): Most relevant image from public data is retrieved based on distance in feature space and label similarity. Then, the associated text of the retrieved image is paired with the query image forming a paired input.
This branch contains the code for homoegenous setup. For hetergenous setup, refer to Hetergenous branch. Heterogenous Setup will be updated soon
The required packages of the environment we used to conduct experiments are listed in environment.yml
.
For datasets, please download the MIMIC-CXR-JPG and resize all of the image into 256x256. Preprocessed annotations can be accessed here.
Please, be informed that dataloader load images from relative path stored in dictionary of preprocessed annotations, so you may need to modify the relative path accordingly. However, to ensure reproducibility, please do not change the order of data items in the annotations, as data mappings for clients are done accordingly.
To reproduce CAR-FML with 8 image only clients run the following shell command:
python main.py --name $EXP_NAME --algorithm fedavgRAG --exp_dir $OUTPUT_DIR --seed $SEED --num_clients 4 --img_clients 6 --txt_clients 0 --alpha 0.3 --server_config_path configs/fedavgin_server.yaml --client_config_path configs/client_configs.yaml --use_refinement
where,
- num_clients = No. of Multimodal Clients
- img_clients = NO. of Unimodal Image Clients
- txt_clients = NO. of Unimodal Text Clients
If you find the paper provides some insights into multimodal FL or our code useful 🤗, please consider citing:
We would like to thank for the code from CreamFL repository.