This package contains code for spam detection in the Vietnamese language. It includes machine learning models trained on Vietnamese text data to classify messages as spam or non-spam. Additionally, it provides a user-friendly UI application built using the PyQt6
library for demonstration and testing purposes.
To use this package, follow these steps:
- Clone the repository:
git clone https://github.com/ankhanhtran02/Vietnamese-Spam-Detection.git
- Navigate to the project directory:
cd Vietnamese-Spam-Detection
- Install dependencies:
pip install -r requirements.txt
To run the UI app for demonstration:
- Ensure you have completed the installation steps.
- Run the following command:
python main.py
- On the left side of the newly created window, you will see a text box which will contain the messages you want to classify. In order to classify new text messages, you can use the Add text and Read text from file buttons, after that, please press the PREDICT button in the middle. The output will appear on the right side of the app window. Messages classified as non-spams will be appended with the string
"[0]"
and have blue color, while spam messages will be appended with"[1]"
and have red color. You can use our prepared text files contained in the demos folder. The algorithm used in making predictions can also be switched using the Choose an algorithm combobox.
To run the experiments described in our report again and check for validity:
- Ensure you have completed the installation steps.
- Run the following commands:
python KNN.py
,python SVM.py
,python ANN.py
,python logistic_regression.py
,python naive_bayes.py
to see the evaluation on the test set of each of the 5 algorithms when using different vectorizers. - Run the command
python vectorizers_comparison.py
to see the comparison between different vectorizers when using the majority vote model. - Run the command
python baseline_system.py
to see the evaluation of the baseline model.
We want to thank the following contributors for their valuable contributions to this project:
- ankhanhtran02: Preprocessing and logistic regression implementation, experimenting
- Decent-Cypher: KNN implementation, UI designing
- KingNoob2022: ANN implementation
- AndrewNguyen4: Naive Bayes implementation
- Vinh.TT: SVM implementation
We also appreciate the help from our classmates, friends and families who contributed by adding more spam message samples to our dataset, which are crucial to the overall performance of our algorithms and validity of our experiments.