This repo tries to perform text classification using an LLM and benchmark it versus BERT and others.
Predict if message is invoice,order,...
More info: customer-support-messages/
Model | Accuracy on Test Dataset |
---|---|
GPT-4O-Mini No Optimization | 0.77 |
GPT-4O-Mini System Prompt Optimization | 0.81 |
GPT-4O-Mini Few-Shot Examples | 0.85 |
GPT-4O-Mini Fine-Tuned | 0.99 |
BERT Zero-Shot | 0.06 |
BERT Fine-Tuned | 0.99 |
Predict the number of stars of a review.
More info: yelp/ folder.
Model | Accuracy on Test Dataset |
---|---|
GPT-4O-Mini System Prompt Optimization | 0.62 |
GPT-4O-Mini Few-Shot Examples | 0.64 |
GPT-4O-Mini Fine-Tuned | 0.68 |
BERT Zero-Shot | 0.17 |
BERT Fine-Tuned | 0.42 |
Llama-3.2-1B Fine-Tuned | 0.42 |
Llama-3.1-8B Fine-Tuned | 0.46 |
Llama-3.2-3B Fine-Tuned | 0.52 |