/FLASH

Primary LanguagePythonBSD 2-Clause "Simplified" LicenseBSD-2-Clause

FLASH

  • An Open Source Heterogeneity-Aware Federated Learning Platform
  • This repository is based on a fork of Leaf, a benchmark for federated settings.

This repository contains the code and experiments for the paper:

WWW'21

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

What is FLASH?

Briefly speaking, we develop FLASH to incorporate heterogeneity into the federated learning simulation process. We mainly follow Google's FL protocol to implement FLASH, so compared to other platforms, we add many additional system configurations, e.g., deadline. For these configurations, see more details in the config file.

Heterogeneity

Hardware Heterogeneity: Each client is bundled with a device type. Each device type has different training speeds and network speeds. We also support self-defined device type(-1) whose parameter can be set manually for more complexed simulation.

The source code for measure the on-device training time is available in the OnDeviceTraining directory

State(Behavior) Heterogeneity: the state and running environment of participating clients can be various and dynamic. We follow Google's FL system, i.e., clients are available for training only when the device is idle, charging, and connected to WiFi. To simulate state heterogeneity, we provide a default state trace which can be accessed here. This default trace is sampled from the large-scale real-world trace (as we use in our paper) that involves upto 136k devices.

Note: FLASH will run in a heterogeneity-unaware (ideal) mode if trace file is not found or hard_hete and behav_hete are set to False

How to run it

example

# 1. Clone and install requirments
git clone https://github.com/PKU-Chengxu/FLASH.git
pip3 install -r requirements.txt

# 2. Change state traces (optional)
# We have a provided a default state traces containing 1000 devices' data, located at the ./data/ dir. 
# IF you want to use a self-collected traces, just modify the file path in [models/client.py](models/client.py), i.e. with open('/path/to/state_traces.json', 'r', encoding='utf-8') as f: 

# 3. Download a benchmark dataset, go to directory of respective dataset `data/$DATASET` for instructions on generating the benchmark dataset

# 4. Run
cd models/
python3 main.py [--config yourconfig.cfg]
# use --config option to specify the config-file, default.cfg will be used if not specified
# the output log is CONFIG_FILENAME.log

Config File

To simplify the command line arguments, we move most of the parameters to a config file. Below is a detailed example.
## whether to consider heterogeneity
behav_hete False # bool, whether to simulate state(behavior) heterogeneity
hard_hete False # bool, whether to simulate hardware heterogeneity, which contains differential on-device training time and network speed


## no training mode to tune system configurations
no_training False # bool, whether to run in no_training mode, skip training process if True


## ML related configurations
dataset femnist # dataset to use
model cnn # file that defines the DNN model
learning_rate 0.01 # learning-rate of DNN
batch_size 10 # batch-size for training 


## system configurations, refer to https://arxiv.org/abs/1812.02903 for more details
num_rounds 500 # number of FL rounds to run
clients_per_round 100 # expected clients in each round
min_selected 60 # min selected clients number in each round, fail if not satisfied
max_sample 340 #  max number of samples to use in each selected client
eval_every 5 # evaluate every # rounds, -1 for not evaluate
num_epochs 5 # number of training epochs (E) for each client in each round
seed 0 # basic random seed
round_ddl 270 0 # μ and σ for deadline, which follows a normal distribution
update_frac 0.8  # min update fraction in each round, round fails when fraction of clients that successfully upload their is not less than "update_frac"
max_client_num -1 # max number of clients in the simulation process, -1 for infinite


### ----- NOTE! below are advanced configurations. 
### ----- Strongly recommend: specify these configurations only after reading the source code. 
### ----- Configuration items of [aggregate_algorithm, fedprox*, structure_k, qffl*] are mutually-exclusive 

## basic algorithm
aggregate_algorithm SucFedAvg # choose in [SucFedAvg, FedAvg], please refer to models/server.py for more details. In the configuration file, SucFedAvg refers to the "FedAvg" algorithm described in https://arxiv.org/pdf/1602.05629.pdf

## compression algorithm
# compress_algo grad_drop # gradiant compress algorithm, choose in [grad_drop, sign_sgd], not use if commented
# structure_k 100
## the k for structured update, not use if commented, please refer to the arxiv for more 

## advanced aggregation algorithms
# fedprox True # whether to apply fedprox and params needed, please refer to the sysml'20 (https://arxiv.org/pdf/1812.06127.pdf) for more details
# fedprox_mu 0.5
# fedprox_active_frac 0.8

# qffl True # whether to apply qffl(q-fedavg) and params needed, please refer to the ICLR'20 (https://arxiv.org/pdf/1905.10497.pdf) for more
# qffl_q 5

Benchmark Datasets

FEMNIST

  • Overview: Image Dataset
  • Details: 62 different classes (10 digits, 26 lowercase, 26 uppercase), images are 28 by 28 pixels (with option to make them all 128 by 128 pixels), 3500 users
  • Task: Image Classification

Celeba

Reddit

  • Overview: We preprocess the Reddit data released by pushshift.io corresponding to December 2017.
  • Details: 1,660,820 users with a total of 56,587,343 comments.
  • Task: Next-word Prediction.

Results in the paper

Config file and results are in the paper_experiments folder. You can just modify the models/default.cfg and then run python main.py to reproduce all the experiments in our paper. The experiments can be devided into the following categories:

  • Basic FL algorithm
  • Advanced FL algorithms
  • Breakdown of Heterogeneity
  • Device Failure
  • Participation Bias

Update 06/02/2022:

We provide a shell to reproduce all the paper experiment. The shell is <FLASH>/paper_experiments/run.sh. You can run the script as following:

cd paper_experiments
chmod +x run.sh
./run.sh [CONFIG_FILE]

This shell uses models/default.cfg by default, and you can specify the config file to use. CONFIG_FILE is the relative path to paper_experiments directory. All of the experiment config files we use are in paper_experiments folder.

On-device Training

the code we used to measure the on-device training time is in OnDeviceTraining folder. Please refer to the doc for more details

Notes

  • please consider to cite our paper if you use the code or data in your research project.
@inproceedings{yang2019characterizing,
  title={Characterizing impacts of heterogeneity in federated learning upon large-scale smartphone data},
  author={Yang, Chengxu and Wang Qipeng and Xu, Mengwei and Chen, Zhenpeng and Bian Kaigui and Liu, Yunxin and Liu, Xuanzhe},
  booktitle={The World Wide Web Conference},
  year={2021}
}