/News-Classification

A python news classification project using TagMyNews Database

Primary LanguageJupyter Notebook

Classification of News HeadLines

News Headline Classification through multiple machine learning model and comparison of results.

Models implemented:

  • Multinomial Naive Bayes
  • Support Vector Machines
  • Neural Network with Softmax Layer

Metrics used to evaluate the performance of models:

  • Precision
  • Recall
  • F1 Score

We evaluate each classifier's ability to select the appropriate category given an article’s title and a brief article description. The confusion matrix is created to explore the results and calculate the metrics.

Feature Extraction Techniques:

The collection of text documents is converted to a matrix of token counts using count vectorize that produces a sparse representation of the counts.

TFIDF,term frequency–inverse document frequency, is the statistic that is intended to reflect how important a word is to a document in our corpus. This is used to extract the most meaningful words in the Corpus.

Link to Dataset: News Article Dataset

TagMyNews Datasets is a collection of datasets of short text fragments that we used for the evaluation of our topic-based text classifier. This is a dataset of ~32K english news extracted from RSS feeds of popular newspaper websites (nyt.com, usatoday.com, reuters.com). Categories are: Sport, Business, U.S., Health, Sci&Tech, World and Entertainment.

Packages required:

  • Pandas
  • sklearn
  • Numpy

Multinomial Naive Bayes Softmax SVM Average of three