This code provides an Automated Machine Learning (AutoML) implementation for static and dynamic data analytics problems. It provides a case study of IoT anomaly detection using many ML algorithms and optimization/AutoML methods (for automating and optimizing ML algorithms). It can also be used as a tutorial to help machine learning researchers to automatically obtain optimized machine learning models with the optimal learning performance on any specific task.
This code is also the implementation of a review paper published in Engineering Applications of Artificial Intelligence (IF: 7.8):
L. Yang and A. Shami, “IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective,” Engineering Applications of Artificial Intelligence, vol. 116, pp. 1-33, 2022, doi: https://doi.org/10.1016/j.engappai.2022.105366.
This paper and code will help industrial users, data analysts, and researchers to better develop machine learning models using automation technology.
- A comprehensive hyperparameter optimization (automatically tuning the hyperparameters of machine learning algorithms) tutorial code can be found in: Hyperparameter-Optimization-of-Machine-Learning-Algorithms
IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective
One-column version: arXiv
Two-column version: Elsevier
- Automated Data Pre-Processing
- Automated Feature Engineering
- Automated Model Selection
- Hyper-Parameter Optimization
- Automated Model Updating (for addressing concept drift, and only for online learning and data stream analytics)
Section 3: IoT data analytics overview
Section 3: Model learning (introduce all common machine learning algorithms)
Section 4: AutoML overview & optimization techniques (introduce what is AutoML and its techniques)
Section 5: Automated data pre-processing
Section 6: Automated feature engineering
Section 7: Automated model updating by handling concept drift
Section 8: Selection of evaluation metrics and validation methods
Section 9: AutoML Tools and libraries
Section 10: Case study (Experimental results, sample code in "AutoML_Batch_Learning_CIC.ipynb")
Section 11: Open challenges and future research directions
Summary table for Sections 3: Table 1 & 2: A comprehensive overview of common ML models, their hyperparameters, their advantages and limitations, and suitable IoT tasks
Summary table for Sections 4: Table 3: The comparison of common optimization methods for CASH and HPO problems
Summary table for Sections 7: Table 5: The comparison of concept drift methods for automated model updating
Summary table for Sections 10: Table 6: The specifications of the proposed AutoML pipeline
Summary table for Sections 11: Table 12: The challenges and research directions of applying AutoML to IoT data analytics
-
The AutoML implementation for static/batch data analytics can be found in AutoML_Batch_Learning_Dataset_1.ipynb and AutoML_Batch_Learning_Dataset2.ipynb
-
The AutoML implementation for dynamic/online data stream analytics can be found in AutoML_Online_Learning_Dataset_1.ipynb and AutoML_Online_Learning_Dataset2.ipynb
- Random forest (RF)
- LightGBM
- K-nearest neighbor (KNN)
- Naive Bayes (NB)
- Artificial Neural Networks (ANN)
- Hoeffding Tree (HT)
- Leveraging Bagging (LB)
- Adaptive Random Forest (ARF)
- Streaming Random Patches (SRP)
- Grid search
- Bayesian Optimization with Tree-structured Parzen Estimator (BO-TPE)
-
CICIDS2017 dataset, a popular network traffic dataset for intrusion detection problems
- Publicly available at: https://www.unb.ca/cic/datasets/ids-2017.html
-
IoTID20 dataset, a novel IoT botnet dataset
- Publicly available at: https://sites.google.com/view/iot-network-intrusion-dataset/home
- Python 3.6+
- Keras
- scikit-learn
- hyperopt
- LightGBM
- River
Please feel free to contact me for any questions or cooperation opportunities. I'd be happy to help.
- Email: liyanghart@gmail.com
- GitHub: LiYangHart and Western OC2 Lab
- LinkedIn: Li Yang
- Google Scholar: Li Yang and OC2 Lab
If you find this repository useful in your research, please cite this article as:
L. Yang and A. Shami, “IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective,” Engineering Applications of Artificial Intelligence, vol. 116, pp. 1-33, 2022, doi: https://doi.org/10.1016/j.engappai.2022.105366.
@article{YANG2022105366,
title = "IoT data analytics in dynamic environments: From an automated machine learning perspective",
author = "Li Yang and Abdallah Shami",
journal = "Engineering Applications of Artificial Intelligence",
volume = {116},
pages = {1-33},
year = "2022",
doi = "https://doi.org/10.1016/j.engappai.2022.105366",
url = "https://www.sciencedirect.com/science/article/pii/S0952197622003803"
}