Naive Bayes and Logistic Regression
Naive Bayes Algorithm
source code: NaiveBayes.py
prerequisite files: ham_train.csv spam_train.csv spam_test.txt ham_test.txt
generate files: The_most_important_words.txt
Logistic Regression Algorithm
source code: Logistic_regression.m (main) grad_check.m (func) h_theta.m (func) logistic_regression_vec.m (func)
prerequisite files: train_matrix.txt test_matrix.txt
generate files: None
To generate necessary data for logistic regression
source code: generateDataMatrix.py
prerequisite files: all_word_map.pkl The_most_important_words.txt ./train/spam ./train/ham ./test/spam ./test/ham
generate files: train_matrix.txt test_matrix.txt
These files are in Emails.zip ./train/spam ./train/ham ./test/spam ./test/ham
These files’ name need to be modified. (delete the appended number) train_matrix600.txt train_matrix1200.txt test_matrix600.txt test_matrix1200.txt