/tap-2024

Notebooks for 2024 Text Analysis Pedagogy Institute course on text classification with LLMs

Primary LanguageJupyter Notebook

TAPI 2024 course: Text Classification with LLMs

This repo contains the notebooks I used to teach a 2024 Text Analysis Pedagogy Institute course on text classification with LLMs.

Description

In this course, you will learn the basics of using a large language model (specifically, ChatGPT) for text classification. Using the ChatGPT application programming interface (API), we will explore how LLMs can assist humans (and humanists) with various text classification tasks (e.g., binary, labeling, applying confidence intervals to judgments, etc.). We will get to know the API, create validation data, engineer prompts, and automate API calls for large data sets.

Course Content

Each numbered notebook corresponds with one 90-minute class session.

Sessions presume that participants are already familiar with Python, Jupyter Notebooks, and pandas.

Lesson 1

Why classify texts?

  • What is text classification?
  • Why is text classification useful?
  • LLMs: The good, the bad, and the ugly

Technical introduction

  • Overview of LLMs in general
  • Distinction between ChatGPT on the website and the API
  • Overview of APIs generally and ChatGPT's API specifically
  • Overview of JSON and response_format={ "type": "json_object" }

API Costs

Lesson 2

Review Lesson 1

  • Why classify texts?
  • LLMs: The good, the bad, and the ugly
  • Advantages of the API: automation, hidden options, structured output

Texts to classify

Overview of text classification types

  • binary, multi-class, multi-label, hierarchical, ordinal

Evaluating LLM classifications

  • How well can the LLM approximate human classification?
  • Gold-standard data
  • Inter-rater reliability
  • Measuring Human-LLM agreement
  • Precision, recall, and F-score

Quantifying model uncertainty

  • Outputting confidence intervals via JSON
  • Using logprobs to output classification token probabilities

Lesson 3

Prompt engineering

  • Systematically testing prompts to find the those that perform best
  • How to measure performance
  • Beware: garbage in, garbage out (GIGO)
  • Prompt engineering techniques

Systematically testing classification prompts

  • Generate sample data
  • Iterate through questions
  • Get classifications in JSON
  • Check low confidence classification results
  • Test multiple prompts systematically

What can we do with classifications once we have them?

  • Study the classified texts
  • Use the classification results as evidence to describe the larger body of texts of which they are a part
  • Use that subset of data to extract additional data
  • Perform additional classification or labeling steps (e.g., sub-classifications)
  • Extract data (e.g., authors and texts from questions about literature)