/text_classifier

Primary LanguageJupyter NotebookMIT LicenseMIT

LinkedIn

Text Classifier

Classify text into 4 categories (Business, Sci/fi, World, sports)

Table of Contents
  1. About The Project
  2. Getting Started
  3. License
  4. Contact
  5. Acknowledgements

About The Project

The project can be broke down to the following steps:

  • import required dependencies

  • load and preprocess text data

    • preprocess text using a user-defined function that uses nltk and re to strip text, and remove stopwords, etc.

    • split data using split function that uses sklearn framework

    • encode data using a user-defined encoder class

    • tokenize the text using a user-defined class

    • one-hot encoding the text user-defined function

    • padding the features using a user-defined function

    • create dataset and dataloader from a user-defined class

  • build a cnn model based on torch.nn.Module

  • building a user-defined trainer used for both training and evaluation

  • training the model

  • evaluating the model

  • inference with test sample

Built With

main frameworks and libraries used in building this project

Getting Started

examine the jupyter notebook and run the cells in the given order to prevent errors

Prerequisites

make sure you have the following packages installed in your virtual environment.

  • pip
    pip install torch

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Samir Gouda - https://www.linkedin.com/in/samirgouda

email: samiir.ahmedd@gmail.com

Project Link: https://github.com/SamirGouda/text_classifier

Acknowledgements