Reddit Comment Sentiment Classifier

Overview

This is a machine learning model for determining the sentiment against Reddit comments on crypto.

Model

The sentiment classifier is fine-tuned based on a distilbert model. The training data and validation data are extracted from this CSV file. Train/validation split is 75/25.

The model accuracy, precision and recall on the validation set are 0.9148, 0.9333 and 0.9090.

The Jupyter notebook for fine-tuning the model is train.ipynb.

Build and Run Server

Clone the project.

git clone https://github.com/lizhaoliu/reddit-comment-classifier.git && cd reddit-comment-classifier

Download the model file and and extract everything to the model directory, i.e.

reddit-comment-classifier/
├── model
│   ├── ckpt
│   │   ├── config.json
│   │   ├── optimizer.pt
│   │   ├── pytorch_model.bin
│   │   ├── rng_state.pth
│   │   ├── scheduler.pt
│   │   ├── trainer_state.json
│   │   └── training_args.bin
...

Create a Conda environment and install Python dependencies.

conda create -n reddit-sentiment-classifier -y -c pytorch -c huggingface python=3.10 pytorch scikit-learn pandas transformers flask && \
conda activate reddit-sentiment-classifier

Bootstrap the Flask server, the server runs on localhost:12345.

python server.py

You can also make POST requests to /predict endpoint containing a text data.