/NLP-spam-or-ham

Exploration of NLP in the context of a text-message classification problem

Primary LanguageJupyter Notebook

NLP-spam-or-ham

Exploration of NLP in the context of a text-message classification problem.

I explore pre-processing steps in a typical NLP pipeline such as punctuation removal, tokenization, stopword removal, lemmatization, and vectorization. I perform feature engineering before comparing the abilities of a Random Forest Classifier and a Gradient Boosted Classifier at labelling text messages as either "spam" or "ham".