A web-app that helps to visualize a word-by-word breakdown of how sentiment analysis classifies text
- Research and decide on a machine learning model/architecture
- Pick out 2-3 datasets we can use to train
- Build a training pipeline
- Train and implement the model
- Serve the model using BentoML as an API
- Create a web app to take in input and visualize the output
Our endpoint is at https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/
Our prediction endpoint can be accessed through making a POST
request to https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/predict
.
# e.g.
curl -X POST "https://sentiment-classifier-gy7t3p45oq-uc.a.run.app/predict" \
-H "accept: */*" -H "Content-Type: application/json" \
-d "{\"text\":\"Some example text.\"}"
Basically, make sure to set the content type to JSON and send a JSON in the format
{
"text": "content"
}
If successful, you should get a 200 OK
status and a body with something along the lines of [[0.8614905476570129], [0.7018478512763977], [0.617088258266449]]
where each entry represents the sentiment from 0 (negative) to 1 (positive) of each word.
Currently, we have only implemented a training pipeline for the IMDB dataset but this is subject to change in the future. You can train a new classifier on the dataset by doing
python train.py
This will replace the current model in /model
. model.json
stores the model architecture, weights.h5
stores trained weights, and tokenizer.json
stores word indices.
BentoML helps us to easily serve our Keras model through an API. You can package a new API by running
python bento_service_packager.py
> ...
> [0.07744759]
> [0.1166597 ]
> [0.18447165]
> [0.20329727]
> [0.24308157]
> [0.25030023]]
> _____
> saved model path: /Users/jzhao/bentoml/repository/SentimentClassifierService/20200604214004_F641D2
If you'd like to save the packaged API, just copy the contents into /bento_deploy
cp -r /Users/jzhao/bentoml/repository/SentimentClassifierService/20200604214004_F641D2/* bento_deploy
# or whatever the autogenerated URI is
There are a few dependency nuances to be aware of before building the actual Docker image. To make sure the build doesn't error out, edit bento_deploy/requirements.txt
is
tensorflow==2.1.0
sklearn
bentoml==0.7.8
Then, we can build and push and run the image as follows
docker build -t bento-classifier:latest .
docker run -p 5000:5000 bento-classifier:latest
Then, visit localhost:5000
to see the BentoML server!
> model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 100, 64) 320000
_________________________________________________________________
lstm (LSTM) (None, 100, 64) 33024
_________________________________________________________________
dropout (Dropout) (None, 100, 64) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 64) 33024
_________________________________________________________________
FC1 (Dense) (None, 256) 16640
_________________________________________________________________
dropout_1 (Dropout) (None, 256) 0
_________________________________________________________________
out_layer (Dense) (None, 1) 257
_________________________________________________________________
activation (Activation) (None, 1) 0
=================================================================
Total params: 402,945
Trainable params: 402,945
Non-trainable params: 0
_________________________________________________________________
- 85% / 15% train-test split
- dataset is balanced (25k positive, 25k negative)
- RMSProp with 1e-3 Learning Rate and early stopping with patience of 2 epochs
- preprocessing
- to lowercase
- removed punctuation
- removed
<br />
tags - tokenized with vocab size of 5k
- max sequence length of 100
- achieved 82.2% accuracy