How do LLMs process multi-token words, common phrases, and named entities? We discover a pattern of token erasure that we hypothesize to be a 'footprint' of how LLMs process unnatural tokenization.
Read more about our paper here:
🌐 https://footprints.baulab.info
📄 https://arxiv.org/abs/2406.20086
To run our code, clone this repository and create a new virtual environment using Python 3.8.10:
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
An implementation of Algorithm 1 in our paper is provided in segment.py
. This script can be run like so:
python segment.py --document my_doc.txt --model meta-llama/Llama-2-7b-hf
allowing you to segment any paragraph of text into high-scoring token sequences.
segments from highest to lowest score:
'dramatic' 0.5815845847818102
'twists' 0.5553912909024803
'low bass' 0.41476866118921824
'cuss' 0.3979072428604316
'fifth' 0.3842911866668146
'using' 0.3568337553491195
...
'ive' -0.07994025301498671
's' -0.14006704260206485
'ations' -0.2306471753618856
'itions' -0.3348596893891435
Adding the --output_html
flag will also save an HTML file in the style of the below example to the folder ./logs/html
, bolding all multi-token sequences and coloring them blue if they have a higher erasure score.
To apply this segmentation algorithm to an entire dataset (as seen in Tables 3 through 6), run
python readout.py --model meta-llama/Meta-Llama-3-8B --dataset ../data/wikipedia_test_500.csv
which specifically replicates Appendix Table 6. You can use your own dataset csv, as long as it contains a 'text' column with the documents you want to analyze.
Checkpoints for each of the linear probes used in our paper are available at https://huggingface.co/sfeucht/footprints. To load a linear probe used in this paper, run the following code snippet:
import torch
import torch.nn as nn
from huggingface_hub import hf_hub_download
class LinearModel(nn.Module):
def __init__(self, input_size, output_size, bias=False):
super(LinearModel, self).__init__()
self.fc = nn.Linear(input_size, output_size, bias=bias)
def forward(self, x):
output = self.fc(x)
return output
# example: llama-2-7b probe at layer 0, predicting 3 tokens ago
# predicting the next token would be `layer0_tgtidx1.ckpt`
checkpoint_path = hf_hub_download(
repo_id="sfeucht/footprints",
filename="llama-2-7b/layer0_tgtidx-3.ckpt"
)
# model_size is 4096 for both models.
# vocab_size is 32000 for Llama-2-7b and 128256 for Llama-3-8b
probe = LinearModel(4096, 32000).cuda()
probe.load_state_dict(torch.load(checkpoint_path))
We have provided the probes used for the paper above. However, if you would still like to train your own linear probes, we provide code for training and testing linear probes on Llama hidden states in ./scripts
. To train a probe on e.g. layer 12 to predict two tokens ago, run
python train_probe.py --layer 12 --target_idx -2
and a linear model will be trained on Llama-2-7b by default and stored as a checkpoint in ./checkpoints
. These checkpoints can then be read by ./scripts/test_probe.py
and tested on either CounterFact tokens, Wikipedia tokens (multi-token words or spaCy entities), or plain Pile tokens. Test results are stored in ./logs
.
python test_probe.py --checkpoint ../checkpoints/Llama-2-7b-hf/.../final.ckpt --test_data counterfact_expanded.csv
We use three datasets in this paper, which can all be found in ./data
.
- CounterFact (Meng et al., 2022)
counterfact_expanded.csv
was used for all of the CounterFact tests in the paper, and includes rows in addition to the original CounterFact dataset.
- Pile (Gao et al., 2020)
train_tiny_1000.csv
was used to train all of the probes.val_tiny_500.csv
was used to validate probe hyperparameters.test_tiny_500.csv
was used for overall Pile test results.
- Wikipedia (Wikimedia Foundation, 2022)
wikipedia_test_500.csv
was used for overall Wikipedia test results.wikipedia_val_500.csv
andwikipedia_train_1000.csv
were not used in this work, but are included for completeness.