Email Classification with Naive Bayes

This project aims to classify emails from the Enron Email Dataset as spam or not-spam (ham) using Naive Bayes Model.

File Contents and Purpose

data
- ham: contains all the ham emails for training.
- spam: contains all the spam emails for training.
- testing: contains a mixture of ham and spam emails for testing.
naivebayes.py: consists of functions for learning the parameters of the model using the training data and measuring the model's performance using the testing data.
util.py : consists of functions for parsing the data in the files.

Dataset

The data is a preprocessed version of the Enron email database. See V. Metsis, I. Androutsopoulos and G. Paliouras, Spam Filtering with Naive Bayes – Which Naive Bayes?” Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS 2006), Mountain View, CA, USA, 2006.

davidbarnes94/email_classification

Email Classification with Naive Bayes

File Contents and Purpose

Dataset