- In this we use emails.csv file which provide information of different emails.
- In this emails.csv file we have a target column "Prediction" that identify whether the mail is spam or not.
- In Order to Train the data we make a dataset that conatin equal number of spam and ham mails.
In Data Testing we test the data on three Different models.
- Naive Bayes: This model used for text identification through text it identify whether the mail is spam or not.
- Support Vector Machine: Support Vector Machine is used for classic classification problems. SVMs work on the algorithm of Maximal Margin.
- Random Forest Classifier: It Ensemble methods turn any feeble model into a highly powerful one.
- CASE 1 : Let's take a word 'Greetings'. Say, it is present in both 'Spam' and 'Not Spam' mails.
- CASE 2 : Let's take a word 'lottery'.Say, it is present in only spam mails.
- CASE 3 : Let's take a word 'cheap'. Say, it is present only in spam mails.
- Clone Github Repository to your computer
- Run Train Data.ipynb to prepare the data in your computer.
- Run Test Data.ipynb to check the accuracy of emails.