FedAKD

This repository provides scripts of the paper: Federated Learning via Augmented Knowledge Distillation for Heterogenous Deep Human Activity Recognition Systems

Deep learning-based systems for Human Activity Recognition (HAR) are useful for health monitoring and activity tracking on wearable devices, but the training of accurate models often requires large and representative datasets. Federated Learning (FL) is a privacy-preserving approach for utilizing data on users' devices to train deep learning models on large datasets, but it is limited to training homogeneous model architectures. In this paper, we propose Federated Learning via Augmented Knowledge Distillation (FedAKD) for training heterogeneous models in a distributed setting. FedAKD is evaluated on two HAR datasets and is shown to be more flexible and efficient than standard FL, with up to 20% performance gains for clients and 200X less communication overhead compared to other FL methods. FedAKD is also relatively more robust under statistical heterogeneity.

Knowledge Distillation

Knowledge Distillation (KD) is a technique to transfer knowledge from a trained model to a to-be-trained model. Unlike standard Federated Learning (FL) algorithms (FedAvg) which communicate model-dependent data (gradients or weights), KD can be used in the context of Federated Learning (FL) to distill knowledge among heterogeneous clients by communicating soft labels calculated using an un-labeled shared dataset.

Knowledge Distillation-based Federated Learning enables clients to independenlty design their learning models.

Federated Learning via Augmented Knowledge Distillation

We push KD one step further by using an augmentation algorithm based on a server-controlled permutation and mixup augmentation [1] to distill knowledge more efficiently.

# Global round r of FedAKD starts here

# 1. Local training 
model.fit(local_data, local_labels, epochs = local_epochs) 

# 2. Receive alpha and beta from server 
alpha, beta = receive_metadata_from_server(global_round = r)

# 3. mixup augmentation 
np.random.seed(beta) # beta is used to set the seed to generate the same augmented version of public data across all nodes 
perm = np.random.permutation(len(pub_data))
aug_pub_data = alpha * pub_data + (1-alpha) * pub_data[perm, ...]


# 4. calculate (1) soft labels (2) performance on test data (prepare local knowledge) 
# A value indicating the performance is send to weight soft labels proportional to performance
local_soft_labels = model.predict(aug_pub_data) 
loss, acc = model.evaluate(test_data, test_labels) 


# 5. Send local knowledge, take some rest, then receive global knowledge  
send_to_server({'soft labels': local_soft_labels, 'performance': acc})
global_soft_labels = receive_labels_from_server() 

# 6. Digest knowledge 
model.fit(aug_pub_data, global_soft_labels) 

# Global round r of FedAKD ends here

Results

We evaluate FedAKD on the two previously mentioned HAR datasets against a recent KD-based FL algorithm: FedMD [2]

Average accuracy gains of Federated Learning experiments (%)
Dataset		HARS		HARB
Data distribution		i.i.d	Non-i.i.d	i.i.d	Non-i.i.d
Method	FedMD	24.5	7.2	11.5	-2.7
	FedAKD (ours)	25.4	27.5	12.7	0.4

References

[1] Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.