/practice_machine_learning

Machine Learning IPython Notebooks

Primary LanguageJupyter Notebook

Machine Learning IPython Notebooks

  1. Classify spam/ham with data from machine learning repository using scikit tfidvectorizer
    • load data, preprocess by removing stopwords, punctuations and lowercase all the characters.
    • check the data actual spam, ham counts, get top words related to spam/ham.
    • vectorize the text by tfidvectorizer, since it performs better than countvectorizer.
    • fit the vectorized matrix into randomforestclassifier, multinomialNB and compare the results