/pyfed

PyFed generic framework of benchmark for Federated Learning

Primary LanguageJupyter Notebook

PyFed

This is the code accompanying the paper "PyFed: Exteding PySyft with N-IID Federated Learning Benchmark" Paper link: https://caiac.pubpub.org/pub/7yr5bkck/release/1

About

PyFed is a benchmarking framework for federated learning extending PySyft, in a generic and distributed way. PyFed supports different aggregations methods and data distributions (Independent and Identically Distributed Data (IID) and Non-IID).

In this sense, PyFed is an alternative benchemarking framework of LEAF for Federated Learning for PySyft.

The benchmarking is done using five dataset: mnist, fashionmnist, cifar10, sent140, shakespeare.

Table of Contents

Installation

Dependencies

Tested stable dependencies:

Install Dependencies

Use the package manager pip to install the requirements of PyFed.

pip install -r requirements.txt

Content

Package Description
models
  • ML models for each dataset.
  • Metrics micro loss and macro loss
datasets
  • Preprocessing for each dataset.
  • Data loader and data splitting.
aggregation Aggregation methods for FL.
run
  • Starting the workers.
  • Launching the training process.
utils
  • Framework arguments.
  • Utility functions.
data Downloading the dataset.
results Results of the training.
experiments Benchmarking configuration.

Usage

For running PyFed, please follow the next steps:

  1. Launch the workers: python run/network/start_websocket_server.py [arguments]
  2. Launch the training: python run/training/main.py [arguments]
  3. Get the results

All arguments have default values. However, these arguments should be set to the desired settings either manually or using a config file.

Launching the Workers

Workers can be launched using different arguments (see below)

Arguments

Argument Description
clients The number of clients: Integer
dataset Dataset to be used: mnist, fashionmnist, cifar10, sent140, shakespeare.
split_mode The split mode used: iid or niid.
global_dataset Share global dataset over all clients.
data_rate Percentage of samples in the global dataset to be added: 0.x
add_error Add error to some samples: True or False.
error_rate Percentage of error to be added: 0.x

In the case of IID distribution (split_mode = iid), the following agruments are available:

Argument Description
iid_share Share samples between clients in the iid split mode.
iid_rate Percentage of samples to share between clients: 0.x

In the case of Non-IID distribution (split_mode = niid), the following agruments are available:

Argument Description
data_size The number of samples that hold each client: Integer
type The split types: random or label split.
label_num The number of classes holded by a client when with label split type: Integer
share_samples How to share samples between clients holding the same classes. In the case of label split type, the following values are possible :
  • 0: clients holding the same class share also the same samples
  • 1: clients holding the same class might also share the same samples (random sampling)
  • 2: clients holding the same class have different samples from this class

Example

Manually

python run/network/start_websocket_server.py --clients = 5 /
--dataset = mnist /
--split_mode = niid /
--type = label /
--data_size = [234,2134,64,4132,1000] /
--label_num = [3,8,5,2,3] /
--share_samples = 2

Or using config.yml

python run/network/start_websocket_server.py -f file_name

Launching the training

After launching the workers correctly, we are ready to start the training using the following arguments.

Arguments

Argument Description
model The file name (without .py extension) containing the model to be trained (see the models directory): cnn,lstm.
batch_size The batch size of the training: Integer.
test_batch_size The batch size used for the test data: Integer.
training_rounds The number of federated learning rounds: Integer.
federate_after_n_batches The number of training steps performed on each remote worker before averaging: Integer.
lr The learning rate: float
cuda The use cuda: True or False.
seed The seed used for randomization: Integer.
eval_every Evaluate the model evrey n rounds: Integer.
fraction_client The number of clients that will in each round: Integer.
optimizer The optimazer that we will use: SGD or Adam.
aggregation The type of aggragation : federated_avg or weighted_avg.
loss The loss function: nll_loss or cross_entropy.

Example

Manually

python run/training/main.py --model = cnn /
--dataset = mnist /
--batch_size = 10 /
--lr = 0.1 /
--training_rounds = 100 /
--eval_every = 10 /
--optimizer = SGD /
--aggregation = federated_avg /
--loss = nll_loss

Using config.yml file

python run/training/main.py -f file_name

Results

The obtained experimentation results using PyFed framework. You can check all the results and the configuration in the experimentation package.

Configuration

Benchmark configuration.

The total number of clients is 100.

Dataset

Model

Epochs

Batch size

Fraction

Learning rate

Rounds

Cifar10

CNN

1

5

0.1

0.1

2500

Fasionmnist

CNN

1

10

0.1

0.1

100

Mnist

CNN

1

10

0.1

0.1

20

CNN (with batch normalisation)

1

10

0.1

0.1

20

Sent140

LSTM

1

1

0.1

0.1

1000

Shakespeare

GRU

1

1

0.1

0.8

2000

Results for IID distributions

Dataset

Model

Accuracy

Loss

Cifar10

CNN

67

0.8043

Fasionmnist

CNN

86.81

0.368

Mnist

CNN

95.63

0.1384

CNN(with batch normalisation)

96.33

0.1154

Sent140

LSTM

65.45

0.8345

Shakespeare

GRU

50.36

1.2452

Results for Non-IID distributions

 

Dataset

 

 Model

Non iid (split by label)

Non iid (random split)

Type 0

Type 1

 Type 2

Accuracy

Loss

Accuracy

Loss

Accuracy

Loss

Accuracy

Loss

Cifar10

CNN

66.78

0.8132

65.89

0.8453

65.45

0.8464

66.89

0.8121

Fasionmnist

CNN

85.36

0.4029

85.8

0.3956

85.42

0.4009

86.57

0.3727

Mnist

CNN

93.45

0.2171

93.88

0.2164

93.84

0.2086

95.04

0.1671

CNN(with batch normalisation)

94.25

0.1902

94.74

0.1771

94.76

0.1884

96.09

0.13

Sent140

LSTM

64.4

0.9244

64.23

0.9445

65.78

0.8123

65.1

0.8663

Shakespeare

GRU

48.26

1.3452

48.76

1.2052

45.23

1.7452

49.46

1.2952

 

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.