TAPI 2024 course: Text Classification with LLMs

This repo contains the notebooks I used to teach a 2024 Text Analysis Pedagogy Institute course on text classification with LLMs.

Description

In this course, you will learn the basics of using a large language model (specifically, ChatGPT) for text classification. Using the ChatGPT application programming interface (API), we will explore how LLMs can assist humans (and humanists) with various text classification tasks (e.g., binary, labeling, applying confidence intervals to judgments, etc.). We will get to know the API, create validation data, engineer prompts, and automate API calls for large data sets.

Course Content

Each numbered notebook corresponds with one 90-minute class session.

Sessions presume that participants are already familiar with Python, Jupyter Notebooks, and pandas.

Lesson 1

Why classify texts?

What is text classification?
Why is text classification useful?
LLMs: The good, the bad, and the ugly

Technical introduction

Overview of LLMs in general
Distinction between ChatGPT on the website and the API
Overview of APIs generally and ChatGPT's API specifically
Overview of JSON and response_format={ "type": "json_object" }

API Costs

How to calculate the cost of a job
How to get an API key
Pricing by model, input, and output
Batch API reduces costs by 50%

Lesson 2

Review Lesson 1

Why classify texts?
LLMs: The good, the bad, and the ugly
Advantages of the API: automation, hidden options, structured output

Texts to classify

Sample data: 500 random Jeopardy! questions somebody scraped from J!-Archive and posted on GitHub

Overview of text classification types

binary, multi-class, multi-label, hierarchical, ordinal

Evaluating LLM classifications

How well can the LLM approximate human classification?
Gold-standard data
Inter-rater reliability
Measuring Human-LLM agreement
Precision, recall, and F-score

Quantifying model uncertainty

Outputting confidence intervals via JSON
Using logprobs to output classification token probabilities

Lesson 3

Prompt engineering

Systematically testing prompts to find the those that perform best
How to measure performance
Beware: garbage in, garbage out (GIGO)
Prompt engineering techniques

Systematically testing classification prompts

Generate sample data
Iterate through questions
Get classifications in JSON
Check low confidence classification results
Test multiple prompts systematically

What can we do with classifications once we have them?

Study the classified texts
Use the classification results as evidence to describe the larger body of texts of which they are a part
Use that subset of data to extract additional data
Perform additional classification or labeling steps (e.g., sub-classifications)
Extract data (e.g., authors and texts from questions about literature)

erikfredner/tap-2024