๐Ÿš€ IoT Malware Detection Model

๐Ÿ“– Overview

This project focuses on detecting botnet attacks in IoT networks, specifically targeting Mirai and Bashlite malware families. By leveraging machine learning and deep learning techniques, the model classifies network traffic into benign activity and 11 distinct malware classes. The study addresses challenges like data imbalance, cross-device generalization, and feature dimensionality in IoT datasets.


๐Ÿ“‚ Dataset

The datasets used in this project are sourced from the UCI Machine Learning Repository ๐ŸŒ. This repository contains network traffic data for 9 IoT devices. For this project, we selected datasets for the following three devices:

  • ๐Ÿ”” Danmini Doorbell
  • ๐ŸŒก๏ธ Ecobee Thermostat
  • ๐Ÿผ Philips Baby Monitor

Each dataset includes benign traffic and malware traffic divided into 11 classes (e.g., Mirai and Bashlite attack families).


๐Ÿ› ๏ธ Project Workflow

  1. ๐Ÿ“Š Dataset Preparation:

    • Combined datasets from the selected devices, categorized into benign and malware traffic.
    • Conducted exploratory data analysis to visualize distributions and detect potential issues like missing values.
  2. ๐Ÿงน Preprocessing:

    • Outlier Removal: Identified and removed extreme values using the IQR method.
    • Class Imbalance Handling: Addressed through undersampling of overrepresented classes and class weighting during training.
  3. ๐Ÿ“‰ Feature Selection:

    • Applied Recursive Feature Elimination (RFE) using Random Forest to select the top 10 features for efficient model training.
  4. ๐Ÿค– Model Training:

    • XGBoost for ๐Ÿ”” Danmini Doorbell.
    • Random Forest for ๐ŸŒก๏ธ Ecobee Thermostat.
    • Feedforward Neural Network (FNN) for ๐Ÿผ Philips Baby Monitor.
    • Models were trained and evaluated for overfitting and performance metrics.
  5. ๐ŸŒ Cross-Device Generalization:

    • Trained models were tested on datasets from other IoT devices to evaluate adaptability.
    • Fine-tuned the models for cross-device performance using transfer learning.

๐Ÿ“ˆ Results

Model Training Accuracy Cross-Device Accuracy After Fine-Tuning Accuracy
XGBoost 100% 70% 68.5%
Random Forest 100% 57.8% 84.5%
Feedforward NN 86.3% 60% 72.8%

๐Ÿ› ๏ธ Technologies and Tools

  • Programming Languages: Python ๐Ÿ
  • Libraries: Pandas, Scikit-learn, XGBoost, Keras ๐Ÿ“ฆ
  • Techniques:
    • Recursive Feature Elimination (RFE) โœ…
    • Outlier Detection (IQR Method) โœ‚๏ธ
    • Cross-validation ๐Ÿ”„
    • Transfer Learning ๐ŸŒ

๐ŸŒŸ Features

  • Automated detection of malware using machine learning models ๐Ÿค–.
  • Preprocessing to handle IoT-specific challenges:
    • Outlier removal and class imbalance ๐Ÿงน.
    • Dimensionality reduction for efficient training ๐Ÿ“‰.
  • Integration of traditional machine learning and deep learning techniques ๐Ÿ› ๏ธ.
  • Cross-device adaptability through transfer learning ๐ŸŒ.

๐Ÿ Conclusion

The project demonstrated the capability of machine learning and deep learning to detect IoT botnets. While high accuracy was achieved on device-specific datasets, challenges in cross-device generalization were addressed through fine-tuning. Future work includes developing unified models and exploring real-time deployment strategies ๐Ÿš€.


๐Ÿ”ง How to Run

  1. Clone this repository and install the dependencies ๐Ÿ› ๏ธ.
  2. Download the dataset from the UCI Repository ๐ŸŒ.
  3. Prepare your dataset with the selected features ๐Ÿ—‚๏ธ.
  4. Use the provided scripts to train the model and evaluate it on cross-device datasets ๐Ÿค–.
  5. Fine-tune the models for generalization if needed ๐Ÿ”„.

Learnings ๐Ÿง 

  • Improved understanding of data preprocessing techniques such as outlier removal and class imbalance handling.
  • Developed expertise in feature selection with RFE and model optimization.
  • Gained hands-on experience with machine learning algorithms and transfer learning techniques.

Feel free to explore the code and results in the repository. ๐Ÿ˜Š