Analysis of Watermarking for Language Models

This code provides an analysis and experimental evaluation of the paper "A Watermark for Large Language Models". It replicates key experiments on a smaller dataset to gain insights into the watermarking approach.

Overview

The paper proposes a method to imperceptibly watermark text generated by large language models. The key idea is to bias the model to overuse a randomized "green list" of tokens, enabling statistical detection.

This code examines the approach by:

  • Summarizing text from the CNN dataset using a T5-Small model
  • Applying the watermarking technique during summarization
  • Measuring watermark strength (z-score) on generated summaries
  • Evaluating loss in summarization quality (perplexity)

Files

  • summarizer.py - Summarization model with watermarking
  • utils.py - Functions for perplexity and detection
  • watermark_for_LLM.ipynb - Main experiments

Experiments

The main experiments replicate analyses from the paper:

  • Watermark strength vs sequence length

  • Tradeoff between quality and detection
  • The code evaluates how z-score and perplexity change for different watermark hyperparameters δ and γ.

Results

The analyses validate core trends from the paper, providing insights into the approach:

  • Sequence length improves detectability
  • Higher perplexity indicates weaker watermarks
  • However, effectiveness depends on the dataset properties.

Install requirements

pip install -r requirements.txt

Run watermark_for_LLM.ipynb This will output figures and metrics for the different experiments.

References

J. Kirchenbauer et al. "A Watermark for Large Language Models" https://arxiv.org/abs/2301.10226