Sample NLP streaming workflow using an LLM from Hugging Face and PyEnsign
This is an example of a sentiment analysis application using sample yelp ratings data from Kaggle using Hugging Face and PyEnsign.
To use PyEnsign, create a free account on rotational.app. You will need to do the following once you create an account:
- Create a project.
- Add the following topic to the project:
yelp_data
. Check out this video on how to add a topic. You can choose your own names for the topic but make sure that you update the code accordingly. - Generate API keys for your project.
You will need to create and source the following environment variables prior to running the example:
export ENSIGN_CLIENT_ID="your client id here"
export ENSIGN_CLIENT_SECRET="your client secret here"
This application consists of three components:
Trainer
reads data from theyelp_train.csv
file and builds a model using the pretrainedDistilBERT
LLM from Hugging Face. The best model gets written to thefinal_model
directory.ScoreDataPublisher
reads data from theyelp_score.csv
file publishes to theyelp_data
topic.Scorer
listens for new messages in theyelp_data
topic. When it receives a new message, it uses the trained Hugging Face model in thefinal_model
directory to make predictions.
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
Run the Trainer in the first window (make sure to activate the virtual environment first). This will create three checkpoint directories under the trained_models
directory and the final model configurations and weights in the final_model
directory.
source venv/bin/activate
python huggingface_trainer.py
Once the training is complete, run the Scorer in the second window (make sure to activate the virtual environment first)
$ source venv/bin/activate
python huggingface_scorer.py score
Run the ScoreDataPublisher in the third window (make sure to activate the virtual environment first)
source venv/bin/activate
python huggingface_scorer.py score_data