Combining Ubiquitous ML Models in IoT

The concept of ML model aggregation rather than data aggregation has gained much attention as it boosts prediction performance while maintaining stability and preserving privacy. In a non-ideal scenario, there are chances for a base model trained on a single device to make independent but complementary errors. To handle such cases, in this repo, we implement 8 robust ML model combining methods that achieves reliable prediction results by combining numerous base models (trained on many devices) to form a central model that effectively limits errors, built-in randomness and uncertainties.

The contributions of this work can be summarized as follows:

The studies from centralized learning, split learning, distributed ensemble learning have extensively investigated combining models trained on devices like smartphones, Raspberry Pis, Jetson Nanos, etc. Such devices have sufficient resources to train base models (or ensembles) using standard training algorithms from Python Scikit-learn or light version ML frameworks like TensorFlow Lite. In contrast, we aim to achieve collective intelligence using MCUs, since billions of deployed IoT devices like HVAC controllers, smart meters, video doorbells have resource-constrained MCU-based hardware with only a few MB of memory.
From the available multitudinous number of studies, we choose, implement, and provide 8 robust ML model combining methods that are compatible with a wide range of datasets (varying feature dimensions and classes) and IoT devices (heterogeneous hardware specifications). We open-source the implementation, utilizing which researchers and engineers can start practicing distributed ensemble learning by combining ML base models trained on ubiquitous IoT devices.

Medical Data Usecase
Algorithms for Combining ML Models
- Dependencies
Devices and Datasets for Experiments
Experiments: Distributed Train then Combine
Useful Books, Toolboxes and Datasets
- Books
- Toolboxes
- Datasets
Classic Papers
Source and Ranking Portals
Reputed Data Mining Conferences/Workshops/Journals
- Conferences and Workshops
- Journals
Doing Good Research and Get it Published

Medical Data Usecase

The Providing sensitive medical data for research use case can be a potential application where Combining ML Models can be utilized.

The data required for most research are sensitive in nature, as it revolves around a private individual. So, GDPR restricts sending such sensitive yet valuable medical data (from hospitals, imaging centers) to research institutes. As shown in above Fig, when the resource-constrained medical devices like insulin-delivery devices, BP apparatus are equipped with IoT hardware-friendly training algorithms like ML-MCU, or Train++, they can perform onboard training of base models, even without depending on the hospital’s local servers. After training, the base models from similar devices can be extracted, combined, and sent to research labs with improved data privacy preservation. For example, the 2 base models M7₁, M7₂ (see above Fig) trained on ECG monitors using vital data of patients can be combined centrally, then shared for research.

Algorithms for Combining ML Models

To enable combining ML models rather than combining distributed data, we select, implement and provide 8 robust methods that apply to a variety of IoT use-case data while also suitable for combining models trained on heterogeneous IoT devices.

Algorithm	Source	Implementation Code
Simple Average	Ensemble Methods: Foundations and Algorithms	Average_Maximization_Voting_Median.py
Weighted Average: Average across all scores/prediction results	Ensemble Methods: Foundations and Algorithms	Average_Maximization_Voting_Median.py
Maximization: Simple combination by taking the maximum scores	Ensemble Methods: Foundations and Algorithms	Average_Maximization_Voting_Median.py
Weighted Majority Vote	Ensemble Methods: Foundations and Algorithms	Average_Maximization_Voting_Median.py
Median: Take the median value across all scores/prediction results	Ensemble Methods: Foundations and Algorithms	Average_Maximization_Voting_Median.py
Dynamic Classifier Selection (DCS)	Combination of multiple classifiers using local accuracy estimates	DCS-LA.py
Dynamic Ensemble Selection (DES)	From dynamic classifier selection to dynamic ensemble selection	DES-LA.py
Stacking (meta ensembling): Use a meta learner to learn the base classifier results	A Kaggler’s Guide to Model Stacking in Practice	Stacking.py

Dependencies

Python 3.5, 3.6, or 3.7
joblib
matplotlib (optional for running examples)
numpy>=1.13
numba>=0.35
pyod
scipy>=0.19.1
scikit_learn>=0.20

Devices and Datasets for Experiments

Devices: Distributed, ubiquitous IoT Devices in the real world have heterogeneous hardware specifications. To replicate this scenario, the devices chosen to carry out the distributed training, given in below Table, contains 10 resource-constrained MCU boards (B1-B10) and 5 CPU devices (C1-C5).

	Board#: Name	Specs: Processor flash, SRAM, clock (MHz)
	B1: nRF52840 Feather	Cortex-M4, 1MB, 256KB, 64
	B2: STM32f10 Blue Pill	Cortex-M0, 128kB, 20KB, 72
	B3: Adafruit HUZZAH32	Xtensa LX6, 4MB, 520KB, 240
	B4: Raspberry Pi Pico	Cortex-M0+, 16MB, 264KB, 133
MCUs	B5: ATSAMD21 Metro	Cortex-M0+, 256kB, 32KB, 48
	B6: Arduino Nano 33	Cortex-M4, 1MB, 256KB, 64
	B7: Teensy 4.0	Cortex-M7, 2MB, 1MB, 600
	B8: STM32 Nucleo H7	Cortex-M7, 2MB, 1MB, 480
	B9: Feather M4 Express	Cortex-M4, 2MB, 192KB, 120
	B10: Arduino Portenta	Cortex-M7+M4, 2MB, 1MB, 480
	CPU#: Name	Basic specs
	C1: W10 Laptop	Intel Core i7 @1.9GHz
	C2: NVIDIA Jetson Nano	128-core GPU @1.4GHz
CPUs	C3: W10 Laptop	Intel Core i5 @1.6GHz
	C4: Ubuntu Laptop	Intel Core i7 @2.4GHz
	C5: Raspberry Pi 4	Cortex-A72 @1.5GHz

Datasets: Below datasets are used for training on the above MCUs and CPUs.

Banknote Authentication (5 features, 2 classes, 1372 samples)
Haberman's Survival (3 features, 2 classes, 306 samples)
Titanic (11 features, 2 classes, 1300 samples)

Experiments: Distributed Train then Combine

Procedure

The training process on all 15 devices is carried out using the resource-friendly classifier training algorithm from ML-MCU.

Initially, for the Banknote dataset, upon all devices completing the training, 15 base models are obtained (first set). Then, each of the 8 ML model combining methods are one by one applied on this first set of models, producing 8 central models (one central model as an output of each combining method). A similar procedure was followed for the remaining datasets, producing the second and third set of models, followed by model combining. At this stage, there are 8 central models for each dataset, whose performance was evaluated in terms of Accuracy, ROC, and F1 score (F1) metrics and reported in below Fig.

Results Analysis

Here, using the below Fig, performance of combined central models are analyzed.

Banknote Authentication dataset: The highest performance is shown by the Dynamic Classifier Selection (DCS-LA) method. Followed by Maximization, then the Median combination method, where both show the same accuracy and slightly different ROC and F1. The Simple Averaging, Weighted Averaging, and the Weighted Majority Vote (WMV) methods achieve similar performance. The combine by Stacking is the least performing, followed by Dynamic Ensemble Selection (DES) method.

Haberman's Survival dataset: Again, DCS-LA showed the top performance. The DES and Stacking methods that produced a low performance for the previous dataset are the second and third best-performing methods. The other algebraic, averaging, and voting methods perform almost the same, achieving good accuracy and F1 but low ROC.

Titanic dataset: Stacking shows the highest accuracy, but DES achieved slightly higher ROC and F1 so, DES is the overall top-performing method. Unlike in previous datasets, here, the algebraic (combine by Maximization and Median), Averaging, and Voting methods show varying performance. From the algebraic methods, the combine by Median performed better. From averaging methods, Simple Averaging performed better.

Observations

The following observation were made during experimentation:

The computational cost for creating an ensemble is not much higher than training a single base model. It is because multiple versions of the base model need to be generated during parameter tuning. Also, the computational cost for combining multiple IoT devices trained base models was small due to the simplicity of the presented combination strategies.
To construct a good ensemble, it is recommended to create base models as accurate and as diverse as possible.
Creating a learning algorithm that is consistently better than others is a hopeless daydream. i.e., from above Fig, Stacking shows top performance for the Titanic dataset and least in the Banknote dataset.

Useful Books, Toolboxes and Datasets

Books

Ensemble Methods: Foundations and Algorithms: Classical text book covering most of the ensemble learning techniques. A must-read for people in the field
Ensemble Machine Learning: Methods and Applications: Responding to a shortage of literature dedicated to the topic, this volume offers comprehensive coverage of state-of-the-art ensemble learning techniques, including various contributions from researchers in leading industrial research labs.
Applications of Supervised and Unsupervised Ensemble Methods: This book contains the extended papers presented at the 2nd Workshop on Supervised and Unsupervised Ensemble Methods and their Applications (SUEMA), in conjunction with ECAI.
Data Mining and Knowledge Discovery Handbook Chapter 45 (Ensemble Methods for Classifiers): This chapter provides an overview of ensemble methods in classification tasks. We present all important types of ensemble method including boosting and bagging. Combining methods and modeling issues such as ensemble diversity and ensemble size are discussed.
Outlier Ensembles: An Introduction: Great intro book for ensemble learning in outlier analysis.

Toolboxes

combo: combo is a comprehensive Python toolbox for combining machine learning (ML) models and scores for various tasks, including classification, clustering, and anomaly detection. It supports the combination of ML models from core libraries such as scikit-learn and xgboost.
pycobra: python library implementing ensemble methods for regression, classification and visualisation tools including Voronoi tesselations.
DESlib: A Python library for dynamic classifier and ensemble selection.
imbalanced-learn: A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning.

Datasets

As a subfield of machine learning, ensemble learning is usually tested against general machine learning benchmark datasets. Some helpful links can be found below:

Classic Papers

Overview & Survey

Ensemble methods in machine learning @MCS. PDF
Popular ensemble methods: An empirical study @JAIR. PDF
Ensemble learning: A survey @ Wiley Interdisciplinary Reviews. PDF

Boosting

Xgboost: A scalable tree boosting system @ KDD. PDF
Lightgbm: A highly efficient gradient boosting decision tree @ NIPS. PDF
CatBoost: unbiased boosting with categorical features @ NIPS. PDF

Clustering Ensemble

Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions @ JMLR. PDF
Clusterer Ensemble @ KBS. PDF
A survey of clustering ensemble algorithms @ IJPRAI. PDF
Clustering ensemble method @ Cybernetics. PDF

Outlier Ensemble

Outlier ensembles: position paper @ SIGKDD Explorations. PDF
Ensembles for unsupervised outlier detection: challenges and research questions a position paper @ SIGKDD Explorations. PDF
Isolation forest @ ICDM. PDF
Outlier detection with autoencoder ensembles @ SDM. PDF
An Unsupervised Boosting Strategy for Outlier Detection Ensembles @ PAKDD. PDF
LSCP: Locally selective combination in parallel outlier ensembles @ SDM. PDF

Ensemble Learning for Data Stream

A survey on ensemble learning for data stream classification @ ACM Computing Surveys. PDF
Ensemble learning for data stream analysis: A survey @Information Fusion. PDF

Key Algorithms

Bagging predictors @Machine Learning. PDF
A decision-theoretic generalization of on-line learning and an application to boosting @JCSS. PDF
Bagging, Boosting @AAAI/IAAI. PDF
Stacked generalization @Neural Networks. PDF
Stacked regressions @Machine Learning. PDF

Source and Ranking Portals

Reputed Data Mining Conferences/Workshops/Journals

Conferences and Workshops

Journals

Doing Good Research and Get it Published

How to do good research, Get it published in SIGKDD and get it cited: A fantastic tutorial on by Prof. Eamonn Keogh (UC Riverside)

Checklist for Revising a SIGKDD Data Mining Paper: A concise checklist by Prof. Eamonn Keogh (UC Riverside)

How to Write and Publish Research Papers for the Premier Forums in Knowledge & Data Engineering: A tutorial on how to structure data mining papers by Prof. Xindong Wu (University of Louisiana at Lafayette)

bharathsudharsan/ML-Model-Combining