RIS-FL

This is the simulation code package for the following paper:

Hang Liu, Xiaojun Yuan, and Ying-Jun Angela Zhang. "Reconfigurable intelligent surface enabled federated learning: A unified communication-learning design approach," to appear at IEEE Transactions on Wireless Communications, 2020. [ArXiv Version]

The package, written on Python 3, reproduces the numerical results of the proposed algorithm in the above paper.

Abstract of Article:

To exploit massive amounts of data generated at mobile edge networks, federated learning (FL) has been proposed as an attractive substitute for centralized machine learning (ML). By collaboratively training a shared learning model at edge devices, FL avoids direct data transmission and thus overcomes high communication latency and privacy issues as compared to centralized ML. To improve the communication efficiency in FL model aggregation, over-the-air computation has been introduced to support a large number of simultaneous local model uploading by exploiting the inherent superposition property of wireless channels. However, due to the heterogeneity of communication capacities among edge devices, over-the-air FL suffers from the straggler issue in which the device with the weakest channel acts as a bottleneck of the model aggregation performance. This issue can be alleviated by device selection to some extent, but the latter still suffers from a tradeoff between data exploitation and model communication. In this paper, we leverage the reconfigurable intelligent surface (RIS) technology to relieve the straggler issue in over-the-air FL. Specifically, we develop a learning analysis framework to quantitatively characterize the impact of device selection and model aggregation error on the convergence of over-the-air FL. Then, we formulate a unified communication-learning optimization problem to jointly optimize device selection, over-the-air transceiver design, and RIS configuration. Numerical experiments show that the proposed design achieves substantial learning accuracy improvement compared with the state-of-the-art approaches, especially when channel conditions vary dramatically across edge devices.

Dependencies

This package is written on Python 3. It requires the following libraries:

Python >= 3.5
torch
torchvision
scipy
CUDA (if GPU is used)

How to Use

The main file is main.py. It can take the following user-input parameters by a parser (also see the function initial() in main.py):

Parameter Name	Meaning	Default Value	Type/Range
M	total number of devices	40	int
N	total number of receive antennas	5	int
L	total number of RIS elements	40	int
nit	maximum number of iterations for Algorithm 1, I_max	100	int
Jmax	number of iterations for Gibbs sampling	50	int
threshold	threshold value for the early stopping in Algorithm 1	1e-2	float
tau	SCA regularization term for Algorithm 1	1	float
trial	total number of Monte Carlo trials	50	int
SNR	signal-to-noise ratio, P_0/sigma^2_n in dB	90.0	float
verbose	output no/importatnt/detailed messages in running the scripts	0	0,1,2
set	which simulation setting (1 or 2) to use; see Section V-A	2	1,2
seed	random seed	1	int
gpu	GPU index used for learning (if possible)	1	int
momentum	SGD momentum, only used for multiple local updates	0.9	float
epochs	number of training rounds T	500	int

Here is an example for executing the scripts in a Linux terminal:

python -u main.py --gpu=0 --trial=50 --set=2

Documentations (Please also see each file for more details):

Gonna UPDATE this part soon

main.py: Initialize the simulation system, optimizing the variables, training the learning model, and storing the result to Store/ as a npz file
- initial(): Initialize the parser function to read the user-input parameters
optlib.py:
- Gibbs(): Optimize x, f, and theta via Algorithm 2 on top of the following two functions
- find_obj_inner(): Given x, compute the objective value by executing sca_fmincon()
- sca_fmincon(): Given the device selection decision x, optimize f and theta via Algorithm 1
flow.py:
- learning_flow(): Read the optimization result, initial the learning model, and perform training and testing on top of Learning_iter()
- Learning_iter(): Given learning model, compute the graidents, update the training models, and perform testing on top of train_script.py
- FedAvg_grad(): Given the aggregated global gradient and the current model, update the global model by eq.(4)
Nets.py:
- CNNMnist(): Specify the convolutional neural network structure used for learning
AirComp.py:
- transmission(): Given the local gradients, perform over-the-air model aggregation; see Section II-C
train_script.py:
- Load_FMNIST_IID(): Download (if needed) and load the Fashion-MNIST data, and distribute them to the local devices
- local_update(): Given a learning model and the distributed training data, compute the local gradients/model changes
- test_model(): Given a learning model, test the accuracy/loss based on certain test images
Monte_Carlo_Averaging.py: Load the npz file from store, and average the Monte Carlo trials
data/: Store the Fashion-MNIST dataset. When running at the first time, it automatically downloads the dataset from the Interenet.
store/: Store output files (*.npz)

Referencing

If you in any way use this code for research that results in publications, please cite our original article listed above.