Adam Terwilliger
Document Classification using the Naive Bayes Algorithm
The basic idea is to write a program that, given a collection of training data consisting of category-labeled documents, “learns” how to classify new documents into the correct category using a Naïve Bayes classifier.
The Naïve Bayes algorithm uses probabilities to perform classification. The probabilities are estimated based on training data for which the value of the classification is known (i.e. it is another form of Supervised Learning). The algorithm is called “naïve” because it makes the simplifying assumption that attribute values are completely independent, given the classification.