This repository provides scripts of the paper: Federated Learning via Augmented Knowledge Distillation for Heterogenous Deep Human Activity Recognition Systems
Deep learning-based systems for Human Activity Recognition (HAR) are useful for health monitoring and activity tracking on wearable devices, but the training of accurate models often requires large and representative datasets. Federated Learning (FL) is a privacy-preserving approach for utilizing data on users' devices to train deep learning models on large datasets, but it is limited to training homogeneous model architectures. In this paper, we propose Federated Learning via Augmented Knowledge Distillation (FedAKD) for training heterogeneous models in a distributed setting. FedAKD is evaluated on two HAR datasets and is shown to be more flexible and efficient than standard FL, with up to 20% performance gains for clients and 200X less communication overhead compared to other FL methods. FedAKD is also relatively more robust under statistical heterogeneity.
Knowledge Distillation (KD) is a technique to transfer knowledge from a trained model to a to-be-trained model. Unlike standard Federated Learning (FL) algorithms (FedAvg) which communicate model-dependent data (gradients or weights), KD can be used in the context of Federated Learning (FL) to distill knowledge among heterogeneous clients by communicating soft labels calculated using an un-labeled shared dataset.
Knowledge Distillation-based Federated Learning enables clients to independenlty design their learning models.
We push KD one step further by using an augmentation algorithm based on a server-controlled permutation and mixup augmentation [1] to distill knowledge more efficiently.
# Global round r of FedAKD starts here
# 1. Local training
model.fit(local_data, local_labels, epochs = local_epochs)
# 2. Receive alpha and beta from server
alpha, beta = receive_metadata_from_server(global_round = r)
# 3. mixup augmentation
np.random.seed(beta) # beta is used to set the seed to generate the same augmented version of public data across all nodes
perm = np.random.permutation(len(pub_data))
aug_pub_data = alpha * pub_data + (1-alpha) * pub_data[perm, ...]
# 4. calculate (1) soft labels (2) performance on test data (prepare local knowledge)
# A value indicating the performance is send to weight soft labels proportional to performance
local_soft_labels = model.predict(aug_pub_data)
loss, acc = model.evaluate(test_data, test_labels)
# 5. Send local knowledge, take some rest, then receive global knowledge
send_to_server({'soft labels': local_soft_labels, 'performance': acc})
global_soft_labels = receive_labels_from_server()
# 6. Digest knowledge
model.fit(aug_pub_data, global_soft_labels)
# Global round r of FedAKD ends here
We evaluate FedAKD on the two previously mentioned HAR datasets against a recent KD-based FL algorithm: FedMD [2]
Average accuracy gains of Federated Learning experiments (%) | |||||
Dataset | HARS | HARB | |||
Data distribution | i.i.d | Non-i.i.d | i.i.d | Non-i.i.d | |
Method | FedMD | 24.5 | 7.2 | 11.5 | -2.7 |
FedAKD (ours) | 25.4 | 27.5 | 12.7 | 0.4 |
[1] Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.
[2] Li, D., & Wang, J. (2019). Fedmd: Heterogenous federated learning via model distillation. arXiv preprint arXiv:1910.03581.