Text Spam Filter by using Natural Language Processing

This repository contains the development of text spam filter by using NLTK and Scikit-learn.

Implementation

In the project, I have implemented the basics of tokenising, part-of-speech tagging, stemming, chunking, and named entity recognition; furthermore, I dove into machine learning and text classification using a simple support vector classifier, KNN, decision tree, random forest, logistic regression, SGD, Naive Bayes classifiers. In the end, I have used the voting classifier as an ensemble method to improve model accuracy. The dataset I have used comes from the UCI Machine Learning Repository. It contains over 5000 SMS labelled messages that have been collected for mobile phone spam research.

It is divided into the following sections:

Regular Expressions
Feature Engineering
Multiple scikit-learn Classifiers
Ensemble Methods

maxxrichard/text_filter_nltk

Text Spam Filter by using Natural Language Processing

Implementation