Document Classifier

Short description of the project.

Overview

This project aims to classify text documents into different categories using a machine learning model trained on datasets collected from various sources. The classification model is built using the creme library, which allows for online/incremental learning, and the final model is serialized into a pickle file for later use.

Files

test.ipynb: Jupyter Notebook file for testing the accuracy of the model on a test dataset.
auto-retrain.ipynb: Jupyter Notebook file for training the model and serializing it into a pickle file.
README.md: This file containing project overview and instructions.

Usage

Testing the Model:
- Open test.ipynb in Jupyter Notebook.
- Run the notebook to evaluate the accuracy of the model on a test dataset.
Training and Serializing the Model:
- Open auto-retrain.ipynb in Jupyter Notebook.
- Run the notebook to train the model on the provided datasets and serialize it into a pickle file.

Requirements

Python 3.x
Jupyter Notebook
PyPDF2
creme
scikit-learn
pandas
matplotlib
seaborn

Usage Example

Here's an example of how to use the trained model: