/spam-classifier

Classify whether the messages are spam or ham

Primary LanguagePythonMIT LicenseMIT

Spam Classifier

Description

Classify whether the messages are spam or ham (legitimate).


Dataset

  • This dataset is collected from UCI Machine Learning Repository. Link to the website is given here
  • It contains 5574 messages out of which around 86% of them are legitimate messages and about 13% of them are spam.
  • To get more information about the dataset, click here

How to use this repo

  1. Clone the project to you local machine
git clone git@github.com:archihalder/spam-classifier.git
  1. Enter the directory
cd spam-classifier
  1. Create a virtual environment in your current directory
pip install virtualenv
virtualenv spam-env
source spam-env/bin/activate
  1. Get the required modules to run
pip install -r requirements.txt
  1. Install the complete nltk module. Write the following in Python Shell
import nltk
nltk.download('all')
  1. Run the file
python3 src/spam-classifier.py

Results and Observations

  • Model used - Naive Bayes Classifier
  • Accuracy achieved - 98.4%