/classification-with-llm

How far can we go with an LLM for a classification problem

Primary LanguageJupyter Notebook

Classification with LLM

This repo tries to perform text classification using an LLM and benchmark it versus BERT and others.

Customer Support Messages Dataset

Predict if message is invoice,order,...
More info: customer-support-messages/

Model Accuracy on Test Dataset
GPT-4O-Mini No Optimization 0.77
GPT-4O-Mini System Prompt Optimization 0.81
GPT-4O-Mini Few-Shot Examples 0.85
GPT-4O-Mini Fine-Tuned 0.99
BERT Zero-Shot 0.06
BERT Fine-Tuned 0.99

Yelp Dataset

Predict the number of stars of a review.
More info: yelp/ folder.

Model Accuracy on Test Dataset
GPT-4O-Mini System Prompt Optimization 0.62
GPT-4O-Mini Few-Shot Examples 0.64
GPT-4O-Mini Fine-Tuned 0.68
BERT Zero-Shot 0.17
BERT Fine-Tuned 0.42
Llama-3.2-1B Fine-Tuned 0.42
Llama-3.1-8B Fine-Tuned 0.46
Llama-3.2-3B Fine-Tuned 0.52