This is the code accompanying the paper "PyFed: Exteding PySyft with N-IID Federated Learning Benchmark" Paper link: https://caiac.pubpub.org/pub/7yr5bkck/release/1
PyFed is a benchmarking framework for federated learning extending PySyft, in a generic and distributed way. PyFed supports different aggregations methods and data distributions (Independent and Identically Distributed Data (IID) and Non-IID).
In this sense, PyFed is an alternative benchemarking framework of LEAF for Federated Learning for PySyft.
The benchmarking is done using five dataset: mnist
, fashionmnist
, cifar10
, sent140
, shakespeare
.
Tested stable dependencies:
Use the package manager pip to install the requirements of PyFed.
pip install -r requirements.txt
Package | Description |
---|---|
models |
|
datasets |
|
aggregation |
Aggregation methods for FL. |
run |
|
utils |
|
data |
Downloading the dataset. |
results |
Results of the training. |
experiments |
Benchmarking configuration. |
For running PyFed, please follow the next steps:
- Launch the workers:
python run/network/start_websocket_server.py [arguments]
- Launch the training:
python run/training/main.py [arguments]
- Get the results
All arguments have default values. However, these arguments should be set to the desired settings either manually or using a config file.
Workers can be launched using different arguments (see below)
Argument | Description |
---|---|
clients |
The number of clients: Integer |
dataset |
Dataset to be used: mnist , fashionmnist , cifar10 , sent140 , shakespeare . |
split_mode |
The split mode used: iid or niid . |
global_dataset |
Share global dataset over all clients. |
data_rate |
Percentage of samples in the global dataset to be added: 0.x |
add_error |
Add error to some samples: True or False . |
error_rate |
Percentage of error to be added: 0.x |
In the case of IID distribution (split_mode = iid
), the following agruments are available:
Argument | Description |
---|---|
iid_share |
Share samples between clients in the iid split mode. |
iid_rate |
Percentage of samples to share between clients: 0.x |
In the case of Non-IID distribution (split_mode = niid
), the following agruments are available:
Argument | Description |
---|---|
data_size |
The number of samples that hold each client: Integer |
type |
The split types: random or label split. |
label_num |
The number of classes holded by a client when with label split type: Integer |
share_samples |
How to share samples between clients holding the same classes. In the case of label split type, the following values are possible :
|
Manually
python run/network/start_websocket_server.py --clients = 5 /
--dataset = mnist /
--split_mode = niid /
--type = label /
--data_size = [234,2134,64,4132,1000] /
--label_num = [3,8,5,2,3] /
--share_samples = 2
Or using config.yml
python run/network/start_websocket_server.py -f file_name
After launching the workers correctly, we are ready to start the training using the following arguments.
Argument | Description |
---|---|
model |
The file name (without .py extension) containing the model to be trained (see the models directory): cnn ,lstm . |
batch_size |
The batch size of the training: Integer. |
test_batch_size |
The batch size used for the test data: Integer. |
training_rounds |
The number of federated learning rounds: Integer. |
federate_after_n_batches |
The number of training steps performed on each remote worker before averaging: Integer. |
lr |
The learning rate: float |
cuda |
The use cuda: True or False . |
seed |
The seed used for randomization: Integer. |
eval_every |
Evaluate the model evrey n rounds: Integer. |
fraction_client |
The number of clients that will in each round: Integer. |
optimizer |
The optimazer that we will use: SGD or Adam . |
aggregation |
The type of aggragation : federated_avg or weighted_avg . |
loss |
The loss function: nll_loss or cross_entropy . |
Manually
python run/training/main.py --model = cnn /
--dataset = mnist /
--batch_size = 10 /
--lr = 0.1 /
--training_rounds = 100 /
--eval_every = 10 /
--optimizer = SGD /
--aggregation = federated_avg /
--loss = nll_loss
Using config.yml file
python run/training/main.py -f file_name
The obtained experimentation results using PyFed framework. You can check all the results and the configuration in the experimentation package.
Benchmark configuration.
The total number of clients is 100.
Dataset |
Model |
Epochs |
Batch size |
Fraction |
Learning rate |
Rounds |
Cifar10 |
CNN |
1 |
5 |
0.1 |
0.1 |
2500 |
Fasionmnist |
CNN |
1 |
10 |
0.1 |
0.1 |
100 |
Mnist |
CNN |
1 |
10 |
0.1 |
0.1 |
20 |
CNN (with batch normalisation) |
1 |
10 |
0.1 |
0.1 |
20 |
|
Sent140 |
LSTM |
1 |
1 |
0.1 |
0.1 |
1000 |
Shakespeare |
GRU |
1 |
1 |
0.1 |
0.8 |
2000 |
Dataset |
Model |
Accuracy |
Loss |
Cifar10 |
CNN |
67 |
0.8043 |
Fasionmnist |
CNN |
86.81 |
0.368 |
Mnist |
CNN |
95.63 |
0.1384 |
CNN(with batch normalisation) |
96.33 |
0.1154 |
|
Sent140 |
LSTM |
65.45 |
0.8345 |
Shakespeare |
GRU |
50.36 |
1.2452 |
Dataset |
Model |
Non iid (split by label) |
Non iid (random split) |
||||||
Type 0 |
Type 1 |
Type 2 |
|||||||
Accuracy |
Loss |
Accuracy |
Loss |
Accuracy |
Loss |
Accuracy |
Loss |
||
Cifar10 |
CNN |
66.78 |
0.8132 |
65.89 |
0.8453 |
65.45 |
0.8464 |
66.89 |
0.8121 |
Fasionmnist |
CNN |
85.36 |
0.4029 |
85.8 |
0.3956 |
85.42 |
0.4009 |
86.57 |
0.3727 |
Mnist |
CNN |
93.45 |
0.2171 |
93.88 |
0.2164 |
93.84 |
0.2086 |
95.04 |
0.1671 |
CNN(with batch normalisation) |
94.25 |
0.1902 |
94.74 |
0.1771 |
94.76 |
0.1884 |
96.09 |
0.13 |
|
Sent140 |
LSTM |
64.4 |
0.9244 |
64.23 |
0.9445 |
65.78 |
0.8123 |
65.1 |
0.8663 |
Shakespeare |
GRU |
48.26 |
1.3452 |
48.76 |
1.2052 |
45.23 |
1.7452 |
49.46 |
1.2952 |
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.