Machine-Learning-Classifying-Emails-into-Spams-and-Hams

Naive Bayes and Logistic Regression

Naive Bayes Algorithm

source code: NaiveBayes.py

prerequisite files: ham_train.csv spam_train.csv spam_test.txt ham_test.txt

generate files: The_most_important_words.txt

Logistic Regression Algorithm

source code: Logistic_regression.m (main) grad_check.m (func) h_theta.m (func) logistic_regression_vec.m (func)

prerequisite files: train_matrix.txt test_matrix.txt

generate files: None

To generate necessary data for logistic regression

source code: generateDataMatrix.py

prerequisite files: all_word_map.pkl The_most_important_words.txt ./train/spam ./train/ham ./test/spam ./test/ham

generate files: train_matrix.txt test_matrix.txt

These files are in Emails.zip ./train/spam ./train/ham ./test/spam ./test/ham

These files’ name need to be modified. (delete the appended number) train_matrix600.txt train_matrix1200.txt test_matrix600.txt test_matrix1200.txt