/Python-Bayesian-Spam-Filter

This is a spam filter implemented by using Bayes' Theorem and Python's NLTK package to perform basic text analysis

Primary LanguageJupyter NotebookMIT LicenseMIT

Python Bayesian Spam Filter

Project Overview

We receive a lot of mails but our mailbox automatically sorts the spams out and only take hams (the mail that you want, opposite of spams) in our inbox. How exactly does our mailbox calcualte whether the mail is a spam or not? This is a spam filter implemented in python to showcase the use of Naive Bayes Classifier and Bag-of-Words model in the our mail box.

Contents

For a detaile walk-through of the code and explanation of the theories, please look at Python notebook or website
If you are more interested in the code itself, please read the Python file
The rest txt files are training and testing data.

Modules

pip install nltk

  • nltk: natural language processing Please also download punctuation and stopwords in nltk
nltk.download('punkt')
nltk.download('stopwords')