Classify text into 4 categories (Business, Sci/fi, World, sports)
Table of Contents
The project can be broke down to the following steps:
-
import required dependencies
-
load and preprocess text data
-
preprocess text using a user-defined function that uses
nltk
andre
to strip text, and remove stopwords, etc. -
split data using split function that uses
sklearn
framework -
encode data using a user-defined encoder class
-
tokenize the text using a user-defined class
-
one-hot encoding the text user-defined function
-
padding the features using a user-defined function
-
create dataset and dataloader from a user-defined class
-
-
build a cnn model based on
torch.nn.Module
-
building a user-defined trainer used for both training and evaluation
-
training the model
-
evaluating the model
-
inference with test sample
main frameworks and libraries used in building this project
examine the jupyter notebook and run the cells in the given order to prevent errors
make sure you have the following packages installed in your virtual environment.
- pip
pip install torch
Distributed under the MIT License. See LICENSE
for more information.
Samir Gouda - https://www.linkedin.com/in/samirgouda
email: samiir.ahmedd@gmail.com
Project Link: https://github.com/SamirGouda/text_classifier
- Goku Mohandas for his great tutorials