The purpose of this project was to explore NLP to create an URL Spam Filter, to make sure that some URL are filtered out and not accessible. Natural Language Processing is very helpful to solve such issues, after learning and extensive training.
This project was fun to do, and also a very good learning experience. The dataset for this epxloration was quite good, still did required a good cleaning and preparation before training the SVC model to understand what we desire. NLP is still useful today, despite a big switch towards LLM, and what can be done wiht NLP is still very powerful. For example, in my project the model was able to identify many of the forbidden words with WordCount:
And our SVC Model was able to attain a 96% Accuracy with hyperparameter boosting, which is very good. Of course we could improve it even more with perhaps a deeper dataset cleaning and more training passes, but for the purpose of the exploration, this was quite good.