/Naive-Bayes-Classifier

This project is an implementation of Naive Bayes algorithm to classify It was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews.

Naive Bayes Network

This classification code is implemented using Naive Bayes Classifier. The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper: Ken Lang, Newsweeder: Learning to fillter netnews, Proceedings of the Twelfth International Conference on Machine Learning, 331-339 (1995).

Though he did not explicitly mention this collection. The 20 newsgroups collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classiffcation and text clustering. The data is organized into 20 different newsgroups, each corresponding to a different topic. Here

The original data set is available at http://qwone.com/~jason/20Newsgroups/.

Required packages for Python 3.7 are numpy, pandas, time and sklearn.metrics.

The code should be placed next to '20Newsgroups' folder. This folder should contain these CSV files: ./20newsgroups/train_data.csv

./20newsgroups/train_label.csv

./20newsgroups/test_data.csv

./20newsgroups/test_label.csv

A short report on the performance comparison of Maximum Likelihood Estimator and Naive Bayes Estimator is attached.