This repository contains a machine learning project for classifying emails as spam or ham (not spam) using Logistic Regression. The project leverages a dataset from Kaggle and aims to accurately identify spam emails to help filter unwanted messages.
The project follows these key steps:
- Data Preprocessing: Cleaning and preparing text data for analysis.
- Feature Extraction: Using
TfidfVectorizerto convert email content into numerical features. - Model Building: Implementing a Logistic Regression model for spam classification.
- Model Evaluation: Assessing the model's performance using accuracy metrics on the test set.
The dataset used in this project is sourced from Kaggle. It consists of labeled email data indicating whether each message is spam or ham. The data is split into training and testing sets to evaluate the model effectively.
To run this project, you need the following Python packages:
pip install numpy pandas scikit-learn