/HFL_Survey

Heterogeneous Federated Learning: State-of-the-art and Research Challenges

Heterogeneous Federated Learning

Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, Dacheng Tao

Survey for Heterogeneous Federated Learning by MARS Group at the Wuhan University, led by Prof. Mang Ye. (Keep updating! ⭐ )

contributions welcome

Table of Contents

Our Works

Federated Learning Survey

Federated Learning with Domain Shift

Federated Learning with Data Noise

Federated Learning with Heterogeneous Graph

Personalized Federated Learning

Federated Learning with Privacy Mechanisms

Federated Learning with Few-Shot

HFL Survey

Overview

Overview

Research Challenges

Statistical Heterogeneity

Statistical heterogeneity refers to the case where the data distribution across clients in federated learning is inconsistent and does not obey the same sampling, i.e., Non-IID.

Statistical Heterogeneity

Model Heterogeneity

Model heterogeneity refers to the fact that in federated learning, participating clients may have local models with different architectures.

Communication Heterogeneity

The devices are typically deployed in different network environments and have different network connectivity settings (3G, 4G, 5G, Wi-Fi), which leads to inconsistent communication bandwidth, latency, and reliability, i.e., communication heterogeneity.

Device Heterogeneity

The differences in device hardware capabilities (CPU, memory, battery life) may lead to different storage and computation capabilities, which inevitably lead to device heterogeneity.

Additional Challenges

Knowledge Transfer Barrier

Federated learning aims to transfer knowledge among different clients to collaboratively learn models with superior performance. However, the four heterogeneities mentioned above will cause knowledge transfer barriers.

Privacy Leakage

Federated learning by itself cannot guarantee perfect data security, as there are still potential privacy risks. Moreover, the above-mentioned four types of heterogeneity inevitably exacerbate privacy leakage in different learning stages.

State-Of-The-Art

Data-Level

Private Data Processing

  1. Data Preparation
  1. Data Privacy Protection

External Data Utilization

  1. Knowledge Distillation
  1. Unsupervised Representation Learning

Model-Level

Federated Optimization

  1. Regularization
  1. Meta Learning
  1. Multi-task Learning

Knowledge Transfer

  1. Knowledge Distillation
  1. Transfer Learning

Architecture Sharing

  1. Backbone Sharing
  1. Classifier Sharing
  1. Other Part Sharing

Server-Level

  1. Client Selection
  1. Client Clustering
  1. Decentralized Communication

Future Direction

Improving Communication Efficiency

Federated Fairness

Privacy Protection

Attack Robustness

  1. Attack Methods
  1. Defense strategies

Uniform Benchmark

  1. General Federated Learning Systems
  • FedMLFedML: A Research Library and Benchmark for Federated Machine Learning

    FedML is an research library that supports distributed training, mobile on-device training, and stand-alone simulation training. It provides standardized implementations of many existing federated learning algorithms, and provides standardized benchmark settings for a variety of datasets, including Non-IID partition methods, number of devices and baseline models.

  • FedScaleFedScale: Benchmarking Model and System Performance of Federated Learning at Scale ICML 2022

    FedScale is a federated learning benchmark suite that provides real-world datasets covering a wide range of federated learning tasks, including image classification, object detection, language modeling, and speech recognition. Additionally, FedScale includes a scalable and extensible FedScale Runtime to enable and standardize real-world end-point deployments of federated learning.

  • OARFThe OARF Benchmark Suite: Characterization and Implications for Federated Learning Systems ACM TIST 2020

    OARF leverages public datasets collected from different sources to simulate real-world data distributions. In addition, OARF quantitatively studies the preliminary relationship among various design metrics such as data partitioning and privacy mechanisms in federated learning systems.

  • FedEvalFedEval: A Holistic Evaluation Framework for Federated Learning

    FedEval is a federated learning evaluation model with five metrics including accuracy, communication, time consumption, privacy and robustness. FedEval is implemented and evaluated on two of the most widely used algorithms, FedSGD and FedAvg.

  1. Specific Federated Learning Systems
  • FedReIDBenchPerformance Optimization of Federated Person Re-identification via Benchmark Analysis ACM MM 2020

    FedReIDBench is a new benchmark for implementing federated learning to person ReID, which includes nine different datasets and two federated scenarios. Specifically, the two federated scenarios are federated-by-camera scenario and federated-by-dataset scenario, which respectively represent the standard server-client architecture and client-edge-cloud architecture.

  • pFL-BenchpFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning NeurIPS 2022 Datasets and Benchmarks Track

    pFL-Bench is a benchmark for personalized federated learning, which covers twelve different dataset variants, including image, text, graph and recommendation data, with unified data partitioning and realistic heterogeneous settings. And pFL-Bench provides more than 20 competitive personalized federated learning baseline implementations to help them with standardized evaluation.

  • FedGraphNNFedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks ICLR 2021 Workshop on DPML

    FedGraphNN is a benchmark system built on a unified formulation of graph federated learning, including extensive datasets from seven different fields, popular Graph Neural Network (GNN) models and federated learning algorithms.

  1. Datasets
  • LEAFLEAF: A Benchmark for Federated Settings NeurIPS 2019 Workshop

    LEAF contains 6 types of federated datasets covering different fields, including image classification (FEMNIST, Synthetic Dataset), image recognition (Celeba), sentiment analysis (Sentiment140) and next character prediction (Shakespeare, Reddit). In addition, LEAF provides two sampling methods of 'IID' and 'Non-IID' to divide the dataset to different clients.

  • Street DatasetReal-World Image Datasets for Federated Learning FL-NeurIPS 2019

    This work introduces a federated dataset for object detection. The dataset contains over 900 images generated from 26 street cameras and 7 object categories annotated with detailed bounding boxes. Besides, the article provides the data division of 5 or 20 clients, in which their data distribution is Non-IID and unbalanced, reflecting the characteristics of real-world federated learning scenarios.