As part of my personal project, I have implemented a spam filtering system using the Multinomial Naive Bayes algorithm. This project involved data handling, cleaning, and processing of a large SMS dataset. I used Python programming language and libraries such as pandas, matplotlib, and scikit-learn to build the model.
The project required me to split the dataset into training and testing sets, clean the data by removing special characters and converting text to lowercase. I then created a vocabulary and dictionary of word counts for each SMS in the training set.
Finally, I trained the Multinomial Naive Bayes model on the training set and tested its accuracy on the test set. The project has provided me with valuable experience in working with large datasets, data cleaning, and processing techniques. It has also given me a deeper understanding of the Multinomial Naive Bayes algorithm and its application in the field of natural language processing.
- Python
- Pandas
- Matplotlib
- Scikit-learn
This project demonstrates my skills in data handling, cleaning, and processing, as well as my ability to build and train a machine learning model. It has been an excellent learning experience that has enhanced my understanding of natural language processing.