archihalder/spam-classifier

Classify whether the messages are spam or ham

PythonMIT

Spam Classifier

Description

Classify whether the messages are spam or ham (legitimate).

Dataset

This dataset is collected from UCI Machine Learning Repository. Link to the website is given here
It contains 5574 messages out of which around 86% of them are legitimate messages and about 13% of them are spam.
To get more information about the dataset, click here

How to use this repo

Clone the project to you local machine

git clone git@github.com:archihalder/spam-classifier.git

Enter the directory

cd spam-classifier

Create a virtual environment in your current directory

pip install virtualenv
virtualenv spam-env
source spam-env/bin/activate

Get the required modules to run

pip install -r requirements.txt

Install the complete nltk module. Write the following in Python Shell

import nltk
nltk.download('all')

Run the file

python3 src/spam-classifier.py

Results and Observations

Model used - Naive Bayes Classifier
Accuracy achieved - 98.4%