moinudeen/sentimix

App to predict the sentiment of a given code-mixed tweet( Hindi + English) using Natural Language Processing.

Jupyter Notebook

Sentimix Challenge

The task is to predict the sentiment of a given code-mixed tweet. The sentiment labels are positive, negative, or neutral, and the code-mixed languages will be English-Hindi.

Contents

This repo will cover the following things in their corresponding notebooks:

1. Data Exploration and Visualization
1. Classic ML models for baseline
1. Transformer based Deep Learning models
1. Testset Evaluation Reports

This repo also has api code to serve the models through a rest API. See below for more details.

Usage

Clone this repo:

git clone https://github.com/moinudeen/sentimix.git
cd sentimix

Install the dependencies:

pip install -r requirements.txt

To explore and visualize data, run this notebook: Data_Exploration_Demo

To train baseline ML models, run this notebook: Classical_ML_Models_Demo

To train deep transformer models, run this notebook: Transformer_Models_Demo

Finally, Take a look at the results of this experiment here: Testset_Evaluation_Report

Once you have trained your models, you can deploy them for real time inference.

To deploy the trained model in an app, please follow the steps below:

cd src/
uvicorn api.main:app --reload

The model is deployed by using FastAPI and uvicorn. Go to http://127.0.0.1:8000/docs to see api documentation.

For Simplicity, only the Logistic regression model has been uploaded to this repo. You can update the api endpoint to point to your own trained model by changing the path value in src/api/model_registry.json