This code provides an Automated Machine Learning (AutoML) implementation for static and dynamic data analytics problems. It provides a case study of IoT anomaly detection using many ML algorithms and optimization/AutoML methods (for automating and optimizing ML algorithms). It can also be used as a tutorial to help machine learning researchers to automatically obtain optimized machine learning models with the optimal learning performance on any specific task.
This code is also the implementation of a review paper published in Engineering Applications of Artificial Intelligence (IF: 7.8):
L. Yang and A. Shami, “IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective,” Engineering Applications of Artificial Intelligence, vol. 116, pp. 1-33, 2022, doi:
This paper and code will help industrial users, data analysts, and researchers to better develop machine learning models using automation technology.
- A comprehensive hyperparameter optimization (automatically tuning the hyperparameters of machine learning algorithms) tutorial code can be found in: Hyperparameter-Optimization-of-Machine-Learning-Algorithms
Paper Link
IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective
One-column version: arXiv
Two-column version: Elsevier
AutoML Pipeline and Procedures
- Automated Data Pre-Processing
- Automated Feature Engineering
- Automated Model Selection
- Hyper-Parameter Optimization
- Automated Model Updating (for addressing concept drift, and only for online learning and data stream analytics)
Quick Navigation of The Paper
Section 3: IoT data analytics overview
Section 3: Model learning (introduce all common machine learning algorithms)
Section 4: AutoML overview & optimization techniques (introduce what is AutoML and its techniques)
Section 5: Automated data pre-processing
Section 6: Automated feature engineering
Section 7: Automated model updating by handling concept drift
Section 8: Selection of evaluation metrics and validation methods
Section 9: AutoML Tools and libraries
Section 10: Case study (Experimental results, sample code in "AutoML_Batch_Learning_CIC.ipynb")
Section 11: Open challenges and future research directions
Summary table for Sections 3: Table 1 & 2: A comprehensive overview of common ML models, their hyperparameters, their advantages and limitations, and suitable IoT tasks
Summary table for Sections 4: Table 3: The comparison of common optimization methods for CASH and HPO problems
Summary table for Sections 7: Table 5: The comparison of concept drift methods for automated model updating
Summary table for Sections 10: Table 6: The specifications of the proposed AutoML pipeline
Summary table for Sections 11: Table 12: The challenges and research directions of applying AutoML to IoT data analytics
The AutoML implementation for static/batch data analytics can be found in AutoML_Batch_Learning_Dataset_1.ipynb and AutoML_Batch_Learning_Dataset2.ipynb
The AutoML implementation for dynamic/online data stream analytics can be found in AutoML_Online_Learning_Dataset_1.ipynb and AutoML_Online_Learning_Dataset2.ipynb
Static Machine Learning & Deep Learning Algorithms
- Random forest (RF)
- LightGBM
- K-nearest neighbor (KNN)
- Naive Bayes (NB)
- Artificial Neural Networks (ANN)
Dynamic/Online Learning Algorithms
- Hoeffding Tree (HT)
- Leveraging Bagging (LB)
- Adaptive Random Forest (ARF)
- Streaming Random Patches (SRP)
Optimization/AutoML Algorithms
- Grid search
- Bayesian Optimization with Tree-structured Parzen Estimator (BO-TPE)
CICIDS2017 dataset, a popular network traffic dataset for intrusion detection problems
- Publicly available at:
IoTID20 dataset, a novel IoT botnet dataset
- Publicly available at:
- Python 3.6+
- Keras
- scikit-learn
- hyperopt
- LightGBM
- River
Please feel free to contact me for any questions or cooperation opportunities. I'd be happy to help.
- Email:
- GitHub: LiYangHart and Western OC2 Lab
- LinkedIn: Li Yang
- Google Scholar: Li Yang and OC2 Lab
If you find this repository useful in your research, please cite this article as:
L. Yang and A. Shami, “IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective,” Engineering Applications of Artificial Intelligence, vol. 116, pp. 1-33, 2022, doi:
title = "IoT data analytics in dynamic environments: From an automated machine learning perspective",
author = "Li Yang and Abdallah Shami",
journal = "Engineering Applications of Artificial Intelligence",
volume = {116},
pages = {1-33},
year = "2022",
doi = "",
url = ""