This project is aimed at building a machine learning model to classify SMS messages as either spam or ham (non-spam). The model is trained on a dataset of labeled SMS messages, using a Naive Bayes classifier with TF-IDF vectorization.
The dataset used for this project is the SMS Spam Collection available at UCI Machine Learning Repository. It contains a collection of more than 5,000 SMS messages that have been tagged as spam or ham.
The model's performance can be evaluated using metrics such as accuracy, precision, recall, and F1-score. These metrics are calculated during training and printed out at the end of the train.py
script.
- Experiment with different machine learning algorithms and hyperparameter tuning techniques.
- Implement more advanced text preprocessing techniques to improve model performance.
This project is licensed under the MIT License - see the LICENSE file for details.