/spam_ham

Primary LanguageJupyter Notebook

Spam vs. Ham Text Classifier

image

This repository showcases an advanced natural language processing pipeline that effectively distinguishes between spam and legitimate messages with exceptional accuracy.

By leveraging GloVe word embeddings, words are transformed into continuous vector representations, capturing intricate semantic relationships for robust text analysis. Additionally, we utilize the power of Stopwords removal from the NLTK library, which filters out common and insignificant words, further improving classification performance.

The Naive Bayes classifier efficiently learns from labelled data, making well-informed predictions on incoming messages, accurately classifying them as spam or ham (non-spam).

Dataset: https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset