This project aims to build an Email Spam Classifier using Natural Language Processing (NLP) techniques. The classifier distinguishes between spam and non-spam (ham) emails, assisting users in managing their email inboxes effectively.
- Cleans the dataset by removing duplicates and handling missing values.
- Utilizes Exploratory Data Analysis (EDA) to understand data distribution and patterns.
- Preprocesses text data through tokenization, stopword removal, and normalization.
- Implements various machine learning models for classification, including SVM, Naive Bayes, Decision Tree, Random Forest, etc.
- Improves model performance through hyperparameter tuning and ensemble learning techniques.
- Deploys the model as a web application using Flask, allowing users to input email content and receive real-time predictions on spam likelihood.
README.md
: Overview of the project and instructions.app.py
: Flask application for deploying the model.model.pkl
: Pickle file containing the trained model.templates/
: HTML templates for the web interface.static/
: Static files such as CSS and JavaScript.
- Clone the repository:
git clone <repository-url>
- Install dependencies:
pip install -r requirements.txt
- Run the Flask application:
python app.py
- Access the web interface in your browser and input email content for prediction.
- Dataset: SMS Spam Collection Dataset from Kaggle.
- Libraries: Utilizes various Python libraries including Flask, scikit-learn, etc.