The repository contains the code solution to BBC Multi Class Classification problem hosted on Kaggle.
Text documents are one of the richest sources of data for businesses.
We’ll use a public dataset from the BBC comprised of 2225 articles, each labeled under one of 5 categories: business, entertainment, politics, sport or tech.
The dataset is broken into 1490 records for training and 735 for testing. The goal will be to build a system that can accurately classify previously unseen news articles into the right category.
The competition is evaluated using Accuracy as a metric.
Following blog has good information on how to look at the problem. https://cloud.google.com/blog/products/gcp/problem-solving-with-ml-automatic-document-classification
The link to the competetion is https://www.kaggle.com/c/learn-ai-bbc/overview