CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities

This repo contains a PyTorch implementation of the paper CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities (MICCAI 2024).

Note: This repository will be updated in the next few days for improved readability, easier environment setup, and datasets management Please stay tuned!

Illustration of CAR-MFL: (a) Multimodal client with access to multimodal data. (b) Multimodal federated system with missing modality. (c) Image client with only image samples; missing text modality is retrieved via our Cross-Modal Augmentation module. (d) Cross-Modal augmentation procedure for a query image (yellow): Most relevant image from public data is retrieved based on distance in feature space and label similarity. Then, the associated text of the retrieved image is paired with the query image forming a paired input.

Setup

This branch contains the code for homoegenous setup. For hetergenous setup, refer to Hetergenous branch. Heterogenous Setup will be updated soon

Environment

The required packages of the environment we used to conduct experiments are listed in environment.yml.

Datasets

For datasets, please download the MIMIC-CXR-JPG and resize all of the image into 256x256. Preprocessed annotations can be accessed here.
Please, be informed that dataloader load images from relative path stored in dictionary of preprocessed annotations, so you may need to modify the relative path accordingly. However, to ensure reproducibility, please do not change the order of data items in the annotations, as data mappings for clients are done accordingly.

Usage

To reproduce CAR-FML with 8 image only clients run the following shell command:

python main.py --name $EXP_NAME  --algorithm fedavgRAG  --exp_dir $OUTPUT_DIR --seed $SEED --num_clients 4 --img_clients 6 --txt_clients 0 --alpha 0.3 --server_config_path configs/fedavgin_server.yaml --client_config_path configs/client_configs.yaml --use_refinement

where,

num_clients = No. of Multimodal Clients
img_clients = NO. of Unimodal Image Clients
txt_clients = NO. of Unimodal Text Clients

Citation

If you find the paper provides some insights into multimodal FL or our code useful 🤗, please consider citing:

Acknowledgements

We would like to thank for the code from CreamFL repository.