This project focuses on detecting Distributed Denial of Service (DDoS) attacks using machine learning models. A real-world dataset is used, and three machine learning algorithms were applied: Random Forest, Logistic Regression, and Neural Network (MLPClassifier). The workflow covers the essential steps from data preprocessing to model evaluation and comparison.
- Pandas: For data manipulation and analysis.
- NumPy: For numerical operations.
- Matplotlib & Seaborn: For data visualization.
- Scikit-learn: For machine learning algorithms, model training, evaluation, and metrics..
- Loading Libraries: Importing the necessary libraries for data analysis, machine learning, and evaluation.
- Data Loading: Loading the dataset that contains both normal and attack traffic data.
- Data Preprocessing:
- Converting categorical data into dummy/indicator variables (if applicable).
- Normalizing or standardizing the data using
StandardScaler
to ensure consistent scaling for certain models.
- Exploratory Data Analysis (EDA):
- Analyzing the distribution of features using visualization techniques (e.g., distribution plots).
- Data Splitting:
- Dividing the dataset into training and testing sets using the
train_test_split()
method.
- Dividing the dataset into training and testing sets using the
- Model Training:
- Training three different models:
- Random Forest: An ensemble model using decision trees.
- Logistic Regression: A linear model for binary classification.
- Neural Network (MLPClassifier): A model using a multi-layer perceptron.
- Training three different models:
- Model Evaluation:
- Using metrics such as accuracy, F1 score, precision, recall, and confusion matrices to evaluate model performance.
- Plotting ROC curves to assess the models' classification effectiveness.
- Model Comparison:
- Comparing the performance of each model based on evaluation metrics.
-
Random Forest:
- Accuracy: 0.9995
- F1 Score: 0.9995
- Precision: 1.0000
- Recall: 0.9990
Observation: The Random Forest classifier achieved nearly perfect accuracy and F1 score, indicating excellent performance in detecting DDoS attacks.
-
Logistic Regression:
- Accuracy: 0.9447
- F1 Score: 0.9498
- Precision: 0.9100
- Recall: 0.9933
Observation: Logistic Regression performed well but had slightly lower accuracy and precision compared to Random Forest. However, it maintained a high recall, meaning it effectively identified most attack cases.
-
Neural Network (MLPClassifier):
- Accuracy: 0.9841
- F1 Score: 0.9850
- Precision: 0.9802
- Recall: 0.9898
Observation: The Neural Network also performed well, balancing between high accuracy and strong F1 score. Its performance falls between Random Forest and Logistic Regression in terms of precision and recall.
This project demonstrates the effectiveness of three machine learning models in detecting DDoS attacks. The Random Forest model shows the best performance, closely followed by the Neural Network, while Logistic Regression performs slightly lower. The comparison highlights the trade-offs in precision, recall, and overall performance.