In this project, I have attempted to analyze the SMS spam dataset and build a machine learning model to predict whether the message is spam or not.
This project contains an executable iPython Notebook, a presentation and source as follows:
- SMS_Spam_Classifier.ipynb - Google Colab notebook containing data summary, exploration, visualisations, text processing, modelling and performance evaluation.
- SMSSpamCollection - Includes SMS spam collection.
Almost every person today owns a mobile phone with messaging and calling capabilities. Spam calls are infamous for the constant ringing of cell phones they often initiate to get promotional or fraudulent information to innocent customers. However, with the cheaper rates on bulk messaging services from wireless networks, a swarm of these spam calls has quickly shifted over to SMS messaging. There, in this scenario, classification becomes mandatory. The objective of this project is to understand the SMS spam collection dataset and build a machine learning model to predict whether the mail is spam or not.
- Understanding the business task.
- Reading data from files given.
- Data pre-processing.
- Data visualization.
- Text processing.
- Modelling data.
- Conclusion.
Midhun R | Avid Learner | Data Analyst | Data Scientist | Machine Learning Enthusiast
Contact me for Data Science Project Collaborations
Analytics Vidhya, 'Stemming vs Lemmatization in NLP: Must-Know Differences'. [Online].
Available: https://www.analyticsvidhya.com/blog/2022/06/stemming-vs-lemmatization-in-nlp-must-know-differences/
Medium, 'Fundamentals of Bag Of Words and TF-IDF'. [Online].
Available: https://medium.com/analytics-vidhya/fundamentals-of-bag-of-words-and-tf-idf-9846d301ff22/
Scikit-learn, 'sklearn.naive_bayes.ComplementNB'. [Online].
Available: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.ComplementNB.html/
Image by upklyak on Freepik