Classification of Text Articles

Table of Contents

📜 Project Description

This project aims to create a Natural Language Processing (NLP) model, to classify more than 2000 articles into 5 categories. The categories are Sport, Tech, Business, Entertainment and Politics.

Before developing the model, every words were assigned to a unique integer. The model then learned from this dictionary of words:numbers and relate it to the type of article during training. In this project, the model is able to classify the articles with a 90% accuracy.

A sneak peek of the model developed and model report are as below:

Model Architecture.png

Confusion Matrix.png

🗂️ Project Files

👉 classification_of_articles.py (model development file)

👉 Folder saved_models

  • model.h5
  • ohe.pkl
  • tokenizer.json

👉 Images folder which contains the following images:

  • confusion matrix
  • epoch accuracy and epoch loss (from tensorboard)
  • model accuracy
  • model architecture
  • model architecture
  • model parameter

👉 Logs folder (used for visualization in TensorBoard)

🚀 Project Usage

  1. This project is done using Python 3.8 on Google Colab. This project used the following modules:

scikit-learn TensorFlow

  1. The dataset can be loaded from here

  2. You may download all the necessary files (dataset & python files) to run the project on your device.

  3. You can also access the file on Google Colab and run the file.

🧑‍💻 Credit

This dataset is taken from: Link