/MOE-FL

The implementation of the "Robust Federated Learning by Mixture of Experts" study.

Primary LanguagePythonApache License 2.0Apache-2.0

Robust Federated Learning by Mixture of Experts (MoE-FL)

This repository contains implementation of "Robust Federated Learning by Mixture of Experts". This study presents a novel weighted average model based on the mixture of experts (MoE) concept to provide robustness in Federated learning (FL) against the poisoned/corrupted/outdated local models.

Getting Started

We have tested and run the code using Ubuntu 20.04. You can follow the instruction below on Ubuntu 20.04, or you may install appropriate libraries considering your distribution.

Requirements

Before running the experiments, make sure you have installed the following packages system-wide:

sudo apt install \
    python3-dev build-essential libsrtp2-dev libavformat-dev \
    libavdevice-dev python3-wheel python3-venv

After installing the required tools, you might want to create a Python virtual environment for easier management of python packages. Therefore:

# Create Python virtual environment
python3 -m venv moe-fl-vene

# Activate virtual environment
source moe-fl-venv/bin/activate

# Do not forget to upgrade the pip and install wheel/setuptools
pip install --upgrade pip 
pip install wheel setuptools

Install python packages using:

pip install -r requirements.txt

Configurations

You can change the configuration by modifying configs/defaults.yml. Some of the available parameters and their default values that you can change are:

runtime:
    epochs: 1
    rounds: 400
    lr: 0.01
    momentum: 0.5
    batch_size: 20
    random_seed: 12345
    weight_decay: 0.0
    test_batch_size: 20
    use_cuda: False # We have run all experiments using CPU only, not tested CUDA

attack:
    attackers_num: 50
    attack_type: 1

server:
    data_fraction: 0.15 # Fraction of data that will be shared with the server

mnist:
    load_fraction: 1
    shards_num: 200 # With 200 shards, there would be 300 samples per each shards
    shards_per_worker_num: 2
    selected_users_num: 30 
    total_users_num: 100 # Total number of users to partion data among them

Experiments

You have two options to run the experiments, either with IID or Non-IID datasets. To do so, you can run run-study-iid-mnist.py or run-study-niid-mnist.py, respectively. You can overwrite some values like epochs and rounds in configs/default.yml when you run the script using command-line options. Options in brackets "[ ]" are optional and options in parens "( )" are required.

run-study-iid-mnist.py 
        (--avg | --opt) \ # average mode or optimized (moe-fl) mode
        [--epochs=NUM] \
        [--rounds=NUM] \
        [--attack-type=ID] \
        [--attackers-num=num] \
        [--selected-workers=NUM] \
        [--log] \ # Enable logging
        [--nep-log] \ # Enable neptune logging
        [--output-prefix=NAME] 

⚠️ Due to a bug in PySyft version < 0.2.9, you cannot run a long experiment because of memory leakage. As of submitting this study, there was no solution to this problem. Hence, we have to manually break the experiments and save the states of each run and continue again. Therefore, there is an argument --start-round in the Non-IID experiment.

run-study-niid-mnist.py 
        (--avg | --opt) \ # average mode or optimized (moe-fl) mode
        --start-round=NUM \ # start from previously run experiment (see warning above)
        --rounds=NUM \
        [--not-pure] \ # force the experiment use not pure dataset
        [--attackers-num=num] \
        [--log] \ # Enable logging
        [--nep-log] \ # Enable neptune logging
        [--output-prefix=NAME] 

Results and Output

Each execution of the code will generate the following files:

  • accuracy: Accuracy for each round
  • all_users: List of all workers names
  • attackers: List of attackers out of all workers
  • configs: Saved configuration file used when experiment started
  • mapped_dataset: Binary file, used to handle memory leakage problem of PySyft. We have to manage the database given to each worker and maintain the same allocation in all subsequent running.
  • models: Folder containing saved models of workers and the server
  • seeds: Seeds used for the experiment
  • selected_workers: List of selected workers out of all workers
  • server_pub_dataset: Binary file, used for saving the dataset of server
  • train_loss: Loss of each round

Notes

  • In order to successfully run the experiment, you need to configure cvxpy package and provide appropriate solvers. We use mosek, which requires a license to work correctly.
  • We use supervisor to run the experiment. Each experiment runs for a specified number of rounds. Then it exits and gets restarted by the supervisor. This is necessary as PySyft does not handle memory properly and it will lead to memory leakage otherwise. You can check a few examples of how we utilize this approach by checking the supervisor folder.
  • In order to use the neptune logging system, you have to set the environment variables properly before starting the experiments. Please refer to their website for more information.

Citation

@article{parsaeefard2021robust,
  title={Robust Federated Learning by Mixture of Experts},
  author={Parsaeefard, Saeedeh and Etesami, Sayed Ehsan and Garcia, Alberto Leon},
  journal={arXiv preprint arXiv:2104.11700},
  year={2021}
}