This repository provides a framework for building network topologies and testing decentralized machine learning models.
The code contained in this repository was tested on the following configuration of Python:
- numpy==1.19.5
- torchvision==0.9.1
- matplotlib==3.4.1
- opacus==0.13.0
- sklearn==0.0
- scikit-learn==0.24.2
- networkx==2.5.1
- prettytable==2.1.0
- torch==1.8.1
- jupyter==1.0.0
pip3 install -r requirements.txt
Run the run.py file in the notebook to start running scenarios. To change which tests are run edit the parameter dictionary. See next section for an explanation of parameters. The existing parameters correspond to a run of 24 scenarios.
The output of each scenario will be saved in results with the timestamp that it finished running.
For running a test you specify all the parameters in lists. We then take all the permutations of these parameters and run these sequentially.
[Potential options listed in a list]
graph_list: ["FullyConnectedGraph", "BinomialGraph", "RingOfCliques", "CirculantGraph", "CycleGraph", "Torus2D"]
Choice of network topology. This decides how the nodes are laid out and who communicates with who.
task_list: ["MNIST"]
Which dataset to use. Currently only implemented MNIST
nr_node_list: [Natural Numbers]
Number of nodes to create. Note certain network topologies can only have a certain number of nodes.
nr_classes_list: [0-10] for MNIST
Only has an effect if data_distribution is "non_iid". Determines the number of class labels that are given to each node.
data_distribution: ['uniform', 'random', 'non_iid_uniform', 'non_iid_random']
How data is split across nodes. Uniform means that nodes are given the same number of samples, random means the samples are randomly partitioned. non_iid means we do not shuffle the data and give each node only a certain number of class labels as defined by nr_classes_list.
lr_list: [Real Numbers]
learning rate
training_epochs: [Natural Numbers]
number of training epochs to run
test_granularity: [Natural Numbers]
frequency with which to test the network on the test data set. Corresponds to test_granularity % epochs
add_privacy_list: [Boolean]
Boolean flag to add differential privacy
epsilon_list: [Real Numbers]
Only has effect if add_privacy_list is True. Quantifies the privacy properties of the DP-SGD algorithm. More info.
delta_list: [Real Numbers]
Only has effect if add_privacy_list is True. Quantifies the privacy properties of the DP-SGD algorithm. More info.
subset: [Boolean]
Whether or not to train on 30% of the training data. Used to save time.
Run these notebooks to generate the results from our paper.
Here is the file structure of the project:
Project
|
|-- data -- |
| |-- MNIST -- |
| |-- processed
| |-- raw
|-- plots -- |
|-- results -- |
|-- .gitignore
|-- auxiliary.py
|-- data.py
|-- decentralized_network.py
|-- decentralized_test.py
|-- graph.py
|-- network_test.py
|-- network.py
|-- node.py
|-- plots.ipynb
|-- README.md
|-- requirements.txt
|-- run.py
|-- tables.ipynb
|-- test_class.py
auxiliary.py
: functions for counting parameters of a model and loading datadata.py
: class for generating data and bringing it in adequate form for the setup at handdecentralized_network.py
: class representing the complete decentralized network, encompassing many nodes fromnode.py
decentralized_test.py
: test suite for decentralized network classgraph.py
: graph topology classesnetwork_test.py
: test suite for testing network class fromnetwork.py
network.py
: network class, used in the node classesnode.py
: classes for nodes; each one represents one agent in the decentralized networkplots.ipynb
: for creating the plotsrun.py
: for executing multiple runstables.ipynb
: for creating tablestest_class.py
: class for creating multiple test setups and saving the results
- Authors: Alec Flowers, Devrim Celik, Nina Mainusch