Classification with LLM

This repo tries to perform text classification using an LLM and benchmark it versus BERT and others.

Customer Support Messages Dataset

Predict if message is invoice,order,...
More info: customer-support-messages/

Model	Accuracy on Test Dataset
GPT-4O-Mini No Optimization	0.77
GPT-4O-Mini System Prompt Optimization	0.81
GPT-4O-Mini Few-Shot Examples	0.85
GPT-4O-Mini Fine-Tuned	0.99
BERT Zero-Shot	0.06
BERT Fine-Tuned	0.99

Predict the number of stars of a review.
More info: yelp/ folder.

Model	Accuracy on Test Dataset
GPT-4O-Mini System Prompt Optimization	0.62
GPT-4O-Mini Few-Shot Examples	0.64
GPT-4O-Mini Fine-Tuned	0.68
BERT Zero-Shot	0.17
BERT Fine-Tuned	0.42
Llama-3.2-1B Fine-Tuned	0.42
Llama-3.1-8B Fine-Tuned	0.46
Llama-3.2-3B Fine-Tuned	0.52