/Anomaly_detection

Anomaly Detection using Machine Learning Models on the UGR'16 Dataset. Explore the effectiveness of Isolation Forest, One-Class SVM, and XGBoost in identifying anomalies in a subsampled dataset from July 2016.

Primary LanguageHTML

Anomaly Detection

Overview

This project focuses on anomaly detection using machine learning models applied to the UGR'16 dataset, specifically the July 2016 dataset. For our analysis, we used a sample of the dataset, selecting one out of every 50 lines for efficiency.

Project Description

This project aims to detect anomalies within the provided dataset using three different machine learning models, namely Isolation Forest, One-Class SVM, and XGBoost. The results indicate the performance of each model on both the training and test sets. These metrics serve as a benchmark for understanding the effectiveness of each method in identifying anomalies in the data.

Models and Results

Model Train Set Balanced Accuracy Train Set Precision Train Set Recall Train Set R2 Train Set F1 Score Test Set Balanced Accuracy Test Set Precision Test Set Recall Test Set R2 Test Set F1 Score
Isolation Forest 49.36% 0.42% 0.74% -1.79 0.53% 49.43% 0.48% 0.86% -1.82 0.61%
One-Class SVM 71.23% 2.28% 91.94% -38.98 4.45% 68.78% 1.93% 85.98% -43.43 3.77%
XGBoost 99.99% 97.73% 100.00% 0.98 98.85% 99.73% 97.02% 99.50% 0.96 98.25%

Dataset

The dataset used in this project is the July 2016 dataset from the UGR'16 dataset, sampled for efficiency in the analysis.