This is a machine learning model for determining the sentiment against Reddit comments on crypto.
The sentiment classifier is fine-tuned based on a distilbert
model. The training data and validation data are extracted from this CSV file. Train/validation split is 75/25.
The model accuracy, precision and recall on the validation set are 0.9148, 0.9333 and 0.9090.
The Jupyter notebook for fine-tuning the model is train.ipynb
.
- Clone the project.
git clone https://github.com/lizhaoliu/reddit-comment-classifier.git && cd reddit-comment-classifier
- Download the model file and and extract everything to the
model
directory, i.e.
reddit-comment-classifier/
├── model
│ ├── ckpt
│ │ ├── config.json
│ │ ├── optimizer.pt
│ │ ├── pytorch_model.bin
│ │ ├── rng_state.pth
│ │ ├── scheduler.pt
│ │ ├── trainer_state.json
│ │ └── training_args.bin
...
- Create a Conda environment and install Python dependencies.
conda create -n reddit-sentiment-classifier -y -c pytorch -c huggingface python=3.10 pytorch scikit-learn pandas transformers flask && \
conda activate reddit-sentiment-classifier
- Bootstrap the Flask server, the server runs on
localhost:12345
.
python server.py
- You can also make
POST
requests to/predict
endpoint containing a text data.