PyFed

This is the code accompanying the paper "PyFed: Exteding PySyft with N-IID Federated Learning Benchmark" Paper link: https://caiac.pubpub.org/pub/7yr5bkck/release/1

About

PyFed is a benchmarking framework for federated learning extending PySyft, in a generic and distributed way. PyFed supports different aggregations methods and data distributions (Independent and Identically Distributed Data (IID) and Non-IID).

In this sense, PyFed is an alternative benchemarking framework of LEAF for Federated Learning for PySyft.

The benchmarking is done using five dataset: mnist, fashionmnist, cifar10, sent140, shakespeare.

PyFed

Installation

Dependencies

Tested stable dependencies:

PySyft v0.2.5
Python 3.7
PyTorch v1.4.0

Install Dependencies

Use the package manager pip to install the requirements of PyFed.

pip install -r requirements.txt

Content

Package	Description
`models`	ML models for each dataset. Metrics micro loss and macro loss
`datasets`	Preprocessing for each dataset. Data loader and data splitting.
`aggregation`	Aggregation methods for FL.
`run`	Starting the workers. Launching the training process.
`utils`	Framework arguments. Utility functions.
`data`	Downloading the dataset.
`results`	Results of the training.
`experiments`	Benchmarking configuration.

Usage

For running PyFed, please follow the next steps:

Launch the workers: python run/network/start_websocket_server.py [arguments]
Launch the training: python run/training/main.py [arguments]
Get the results

All arguments have default values. However, these arguments should be set to the desired settings either manually or using a config file.

Launching the Workers

Workers can be launched using different arguments (see below)

Arguments

Argument	Description
`clients`	The number of clients: Integer
`dataset`	Dataset to be used: `mnist`, `fashionmnist`, `cifar10`, `sent140`, `shakespeare`.
`split_mode`	The split mode used: `iid` or `niid`.
`global_dataset`	Share global dataset over all clients.
`data_rate`	Percentage of samples in the global dataset to be added: 0.x
`add_error`	Add error to some samples: `True` or `False`.
`error_rate`	Percentage of error to be added: 0.x

In the case of IID distribution (split_mode = iid), the following agruments are available:

Argument	Description
`iid_share`	Share samples between clients in the iid split mode.
`iid_rate`	Percentage of samples to share between clients: 0.x

In the case of Non-IID distribution (split_mode = niid), the following agruments are available:

Argument	Description
`data_size`	The number of samples that hold each client: Integer
`type`	The split types: `random` or `label` split.
`label_num`	The number of classes holded by a client when with `label` split type: Integer
`share_samples`	How to share samples between clients holding the same classes. In the case of `label` split type, the following values are possible : 0: clients holding the same class share also the same samples 1: clients holding the same class might also share the same samples (random sampling) 2: clients holding the same class have different samples from this class

Example

Manually

python run/network/start_websocket_server.py --clients = 5 /
--dataset = mnist /
--split_mode = niid /
--type = label /
--data_size = [234,2134,64,4132,1000] /
--label_num = [3,8,5,2,3] /
--share_samples = 2

Or using config.yml

python run/network/start_websocket_server.py -f file_name

Launching the training

After launching the workers correctly, we are ready to start the training using the following arguments.

Arguments

Argument	Description
`model`	The file name (without .py extension) containing the model to be trained (see the models directory): `cnn`,`lstm`.
`batch_size`	The batch size of the training: Integer.
`test_batch_size`	The batch size used for the test data: Integer.
`training_rounds`	The number of federated learning rounds: Integer.
`federate_after_n_batches`	The number of training steps performed on each remote worker before averaging: Integer.
`lr`	The learning rate: float
`cuda`	The use cuda: `True` or `False`.
`seed`	The seed used for randomization: Integer.
`eval_every`	Evaluate the model evrey n rounds: Integer.
`fraction_client`	The number of clients that will in each round: Integer.
`optimizer`	The optimazer that we will use: `SGD` or `Adam`.
`aggregation`	The type of aggragation : `federated_avg` or `weighted_avg`.
`loss`	The loss function: `nll_loss` or `cross_entropy`.

Example

Manually

python run/training/main.py --model = cnn /
--dataset = mnist /
--batch_size = 10 /
--lr = 0.1 /
--training_rounds = 100 /
--eval_every = 10 /
--optimizer = SGD /
--aggregation = federated_avg /
--loss = nll_loss

Using config.yml file

python run/training/main.py -f file_name

Results

The obtained experimentation results using PyFed framework. You can check all the results and the configuration in the experimentation package.

Configuration

Benchmark configuration.

The total number of clients is 100.

Dataset	Model	Epochs	Batch size	Fraction	Learning rate	Rounds
Cifar10	CNN	1	5	0.1	0.1	2500
Fasionmnist	CNN	1	10	0.1	0.1	100
Mnist	CNN	1	10	0.1	0.1	20
Mnist	CNN (with batch normalisation)	1	10	0.1	0.1	20
Sent140	LSTM	1	1	0.1	0.1	1000
Shakespeare	GRU	1	1	0.1	0.8	2000

Results for IID distributions

Dataset	Model	Accuracy	Loss
Cifar10	CNN	67	0.8043
Fasionmnist	CNN	86.81	0.368
Mnist	CNN	95.63	0.1384
Mnist	CNN(with batch normalisation)	96.33	0.1154
Sent140	LSTM	65.45	0.8345
Shakespeare	GRU	50.36	1.2452

Results for Non-IID distributions

Dataset	Model	Non iid (split by label)						Non iid (random split)
		Type 0		Type 1		Type 2		Non iid (random split)
		Accuracy	Loss	Accuracy	Loss	Accuracy	Loss	Accuracy	Loss
Cifar10	CNN	66.78	0.8132	65.89	0.8453	65.45	0.8464	66.89	0.8121
Fasionmnist	CNN	85.36	0.4029	85.8	0.3956	85.42	0.4009	86.57	0.3727
Mnist	CNN	93.45	0.2171	93.88	0.2164	93.84	0.2086	95.04	0.1671
Mnist	CNN(with batch normalisation)	94.25	0.1902	94.74	0.1771	94.76	0.1884	96.09	0.13
Sent140	LSTM	64.4	0.9244	64.23	0.9445	65.78	0.8123	65.1	0.8663
Shakespeare	GRU	48.26	1.3452	48.76	1.2052	45.23	1.7452	49.46	1.2952

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

okazaki0/pyfed

PyFed

About

Table of Contents

Installation

Dependencies

Install Dependencies

Content

Usage

Launching the Workers

Arguments

Example

Launching the training

Arguments

Example

Results

Configuration

Results for IID distributions

Results for Non-IID distributions

Contributing