huggingface-example

Sample NLP streaming workflow using an LLM from Hugging Face and PyEnsign

This is an example of a sentiment analysis application using sample yelp ratings data from Kaggle using Hugging Face and PyEnsign.

To use PyEnsign, create a free account on rotational.app. You will need to do the following once you create an account:

Create a project.
Add the following topic to the project: yelp_data. Check out this video on how to add a topic. You can choose your own names for the topic but make sure that you update the code accordingly.
Generate API keys for your project.

You will need to create and source the following environment variables prior to running the example:

export ENSIGN_CLIENT_ID="your client id here"
export ENSIGN_CLIENT_SECRET="your client secret here"

This application consists of three components:

Trainer reads data from the yelp_train.csv file and builds a model using the pretrained DistilBERT LLM from Hugging Face. The best model gets written to the final_model directory.
ScoreDataPublisher reads data from the yelp_score.csv file publishes to the yelp_data topic.
Scorer listens for new messages in the yelp_data topic. When it receives a new message, it uses the trained Hugging Face model in the final_model directory to make predictions.

Steps to run the application

virtualenv venv

source venv/bin/activate

pip install -r requirements.txt

source venv/bin/activate

python huggingface_trainer.py

$ source venv/bin/activate

python huggingface_scorer.py score

source venv/bin/activate

python huggingface_scorer.py score_data