Web Page Phishing Detection

Introduction

This project aims to develop and evaluate machine learning models for the detection of phishing URLs. The provided dataset contains 11,430 URLs with 87 extracted features, making it a valuable resource for benchmarking phishing detection systems.

Dataset

Project Overview

In this project, we use machine learning techniques to build and evaluate phishing detection models. Here's a brief overview of the project's components:

  • Data Preprocessing: Data cleaning, feature selection, and outlier handling.
  • Model Building: Training various machine learning models (e.g., Logistic Regression, Decision Trees, Random Forest, SVM) on the dataset.
  • Model Evaluation: Assessing model performance using metrics like accuracy, precision, recall, and F1-score.
  • Hyperparameter Tuning: Optimizing model hyperparameters for improved performance.
  • Dimensionality Reduction: Applying techniques like PCA and t-SNE for dimensionality reduction.
  • Cross-Validation: Assessing model generalization using k-fold cross-validation.

Model Evaluation

The model evaluation metrics are the following:

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • Confusion Matrix