This project aims to classify emails from the Enron Email Dataset as spam or not-spam (ham) using Naive Bayes Model.
- data
- ham: contains all the ham emails for training.
- spam: contains all the spam emails for training.
- testing: contains a mixture of ham and spam emails for testing.
- naivebayes.py: consists of functions for learning the parameters of the model using the training data and measuring the model's performance using the testing data.
- util.py : consists of functions for parsing the data in the files.
The data is a preprocessed version of the Enron email database. See V. Metsis, I. Androutsopoulos and G. Paliouras, Spam Filtering with Naive Bayes – Which Naive Bayes?” Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS 2006), Mountain View, CA, USA, 2006.