Classification of News HeadLines

DEMO VIDEO

News Headline Classification through multiple machine learning model and comparison of results.

Models implemented:

Multinomial Naive Bayes
Support Vector Machines
Neural Network with Softmax Layer

Metrics used to evaluate the performance of models:

Precision
Recall
F1 Score

We evaluate each classifier's ability to select the appropriate category given an article’s title and a brief article description. The confusion matrix is created to explore the results and calculate the metrics.

Feature Extraction Techniques:

The collection of text documents is converted to a matrix of token counts using count vectorize that produces a sparse representation of the counts.

TFIDF,term frequency–inverse document frequency, is the statistic that is intended to reflect how important a word is to a document in our corpus. This is used to extract the most meaningful words in the Corpus.

Link to Dataset: News Article Dataset

TagMyNews Datasets is a collection of datasets of short text fragments that we used for the evaluation of our topic-based text classifier. This is a dataset of ~32K english news extracted from RSS feeds of popular newspaper websites (nyt.com, usatoday.com, reuters.com). Categories are: Sport, Business, U.S., Health, Sci&Tech, World and Entertainment.

Packages required:

Pandas
sklearn
Numpy

lengyyy/News-Classification

Classification of News HeadLines

DEMO VIDEO

Feature Extraction Techniques:

Link to Dataset: News Article Dataset