This project aims to develop and evaluate machine learning models for the detection of phishing URLs. The provided dataset contains 11,430 URLs with 87 extracted features, making it a valuable resource for benchmarking phishing detection systems.
- Dataset Name: Phishing Detection Dataset
- Dataset Description: The dataset includes a balanced collection of phishing data.
In this project, we use machine learning techniques to build and evaluate phishing detection models. Here's a brief overview of the project's components:
- Data Preprocessing: Data cleaning, feature selection, and outlier handling.
- Model Building: Training various machine learning models (e.g., Logistic Regression, Decision Trees, Random Forest, SVM) on the dataset.
- Model Evaluation: Assessing model performance using metrics like accuracy, precision, recall, and F1-score.
- Hyperparameter Tuning: Optimizing model hyperparameters for improved performance.
- Dimensionality Reduction: Applying techniques like PCA and t-SNE for dimensionality reduction.
- Cross-Validation: Assessing model generalization using k-fold cross-validation.
The model evaluation metrics are the following:
- Accuracy
- Precision
- Recall
- F1 Score
- Confusion Matrix