Classify whether the messages are spam or ham (legitimate).
- This dataset is collected from UCI Machine Learning Repository. Link to the website is given here
- It contains 5574 messages out of which around 86% of them are legitimate messages and about 13% of them are spam.
- To get more information about the dataset, click here
- Clone the project to you local machine
git clone git@github.com:archihalder/spam-classifier.git
- Enter the directory
cd spam-classifier
- Create a virtual environment in your current directory
pip install virtualenv
virtualenv spam-env
source spam-env/bin/activate
- Get the required modules to run
pip install -r requirements.txt
- Install the complete
nltk
module. Write the following in Python Shell
import nltk
nltk.download('all')
- Run the file
python3 src/spam-classifier.py
- Model used - Naive Bayes Classifier
- Accuracy achieved - 98.4%