This project focuses on detecting botnet attacks in IoT networks, specifically targeting Mirai and Bashlite malware families. By leveraging machine learning and deep learning techniques, the model classifies network traffic into benign activity and 11 distinct malware classes. The study addresses challenges like data imbalance, cross-device generalization, and feature dimensionality in IoT datasets.
The datasets used in this project are sourced from the UCI Machine Learning Repository ๐. This repository contains network traffic data for 9 IoT devices. For this project, we selected datasets for the following three devices:
- ๐ Danmini Doorbell
- ๐ก๏ธ Ecobee Thermostat
- ๐ผ Philips Baby Monitor
Each dataset includes benign traffic and malware traffic divided into 11 classes (e.g., Mirai and Bashlite attack families).
-
๐ Dataset Preparation:
- Combined datasets from the selected devices, categorized into benign and malware traffic.
- Conducted exploratory data analysis to visualize distributions and detect potential issues like missing values.
-
๐งน Preprocessing:
- Outlier Removal: Identified and removed extreme values using the IQR method.
- Class Imbalance Handling: Addressed through undersampling of overrepresented classes and class weighting during training.
-
๐ Feature Selection:
- Applied Recursive Feature Elimination (RFE) using Random Forest to select the top 10 features for efficient model training.
-
๐ค Model Training:
- XGBoost for ๐ Danmini Doorbell.
- Random Forest for ๐ก๏ธ Ecobee Thermostat.
- Feedforward Neural Network (FNN) for ๐ผ Philips Baby Monitor.
- Models were trained and evaluated for overfitting and performance metrics.
-
๐ Cross-Device Generalization:
- Trained models were tested on datasets from other IoT devices to evaluate adaptability.
- Fine-tuned the models for cross-device performance using transfer learning.
Model | Training Accuracy | Cross-Device Accuracy | After Fine-Tuning Accuracy |
---|---|---|---|
XGBoost | 100% | 70% | 68.5% |
Random Forest | 100% | 57.8% | 84.5% |
Feedforward NN | 86.3% | 60% | 72.8% |
- Programming Languages: Python ๐
- Libraries: Pandas, Scikit-learn, XGBoost, Keras ๐ฆ
- Techniques:
- Recursive Feature Elimination (RFE) โ
- Outlier Detection (IQR Method) โ๏ธ
- Cross-validation ๐
- Transfer Learning ๐
- Automated detection of malware using machine learning models ๐ค.
- Preprocessing to handle IoT-specific challenges:
- Outlier removal and class imbalance ๐งน.
- Dimensionality reduction for efficient training ๐.
- Integration of traditional machine learning and deep learning techniques ๐ ๏ธ.
- Cross-device adaptability through transfer learning ๐.
The project demonstrated the capability of machine learning and deep learning to detect IoT botnets. While high accuracy was achieved on device-specific datasets, challenges in cross-device generalization were addressed through fine-tuning. Future work includes developing unified models and exploring real-time deployment strategies ๐.
- Clone this repository and install the dependencies ๐ ๏ธ.
- Download the dataset from the UCI Repository ๐.
- Prepare your dataset with the selected features ๐๏ธ.
- Use the provided scripts to train the model and evaluate it on cross-device datasets ๐ค.
- Fine-tune the models for generalization if needed ๐.
- Improved understanding of data preprocessing techniques such as outlier removal and class imbalance handling.
- Developed expertise in feature selection with RFE and model optimization.
- Gained hands-on experience with machine learning algorithms and transfer learning techniques.
Feel free to explore the code and results in the repository. ๐