This project aims to create a Natural Language Processing (NLP) model, to classify more than 2000 articles into 5 categories. The categories are Sport, Tech, Business, Entertainment and Politics.
Before developing the model, every words were assigned to a unique integer. The model then learned from this dictionary of words:numbers and relate it to the type of article during training. In this project, the model is able to classify the articles with a 90% accuracy.
A sneak peek of the model developed and model report are as below:
👉 classification_of_articles.py (model development file)
👉 Folder saved_models
- model.h5
- ohe.pkl
- tokenizer.json
👉 Images folder which contains the following images:
- confusion matrix
- epoch accuracy and epoch loss (from tensorboard)
- model accuracy
- model architecture
- model architecture
- model parameter
👉 Logs folder (used for visualization in TensorBoard)
- This project is done using Python 3.8 on Google Colab. This project used the following modules:
-
The dataset can be loaded from here
-
You may download all the necessary files (dataset & python files) to run the project on your device.
-
You can also access the file on Google Colab and run the file.
This dataset is taken from: Link