/Magic_Words

Code for the paper "What's the Magic Word? A Control Theory of LLM Prompting"

Primary LanguageJupyter NotebookMIT LicenseMIT

Magic_Words

Code for the paper What's the Magic Word? A Control Theory of LLM Prompting.

Implements greedy back generation and greedy coordinate gradient (GCG) to find optimal control prompts (magic words).

Setup

# create a virtual environment
python3 -m venv venv

# activate the virtual environment
source venv/bin/activate

# install the package and dependencies
pip install -e .
pip install -r requirements.txt

Example Script (Pointwise Control)

Run the script in scripts/backoff_hack.py for a demo of finding the magic words (optimal control prompt) for a given question-answer pair using greedy search and greedy coordinate gradient (GCG). It applies the same algorithms as in the LLM Control Theory paper:

python3 scripts/backoff_hack_demo.py

See the comments in the script for further details. This issue thread is also a good resource for getting up and running.

Example Script (Optimizing Prompts for Dataset)

Here we apply the GCG algorithm from the LLM attacks paper to optimizing prompts on a dataset, similar to the AutoPrompt paper.

python3 scripts/sgcg.py \
    --dataset datasets/100_squad_train_v2.0.jsonl \
    --model meta-llama/Meta-Llama-3-8B-Instruct \
    --k 20 \
    --max_parallel 30 \
    --grad_batch_size 50 \
    --num_iters 30
    

Open-Ended Exploration of the Reachable Set

python3 scripts/greedy_forward_single.py \
    --model meta-llama/Meta-Llama-3-8B \
    --x_0 "helloworld1" \
    --output_dir results/helloworld1 \
    --max_iters 100 \
    --max_parallel 100 \
    --pool_size 100 \
    --rand_pool \
    --push 0.1 \
    --pull 1.0 \
    --frac_ext 0.2 

Testing

# run all tests: 
coverage run -m unittest discover

# get coverage report:
coverage report --include=prompt_landscapes/*

# run a specific test:
coverage run -m unittest tests/test_compute_score.py