Exploration of NLP in the context of a text-message classification problem.
I explore pre-processing steps in a typical NLP pipeline such as punctuation removal, tokenization, stopword removal, lemmatization, and vectorization. I perform feature engineering before comparing the abilities of a Random Forest Classifier and a Gradient Boosted Classifier at labelling text messages as either "spam" or "ham".