Code for the paper What's the Magic Word? A Control Theory of LLM Prompting.
Implements greedy back generation and greedy coordinate gradient (GCG) to find optimal control prompts (magic words).
# create a virtual environment
python3 -m venv venv
# activate the virtual environment
source venv/bin/activate
# install the package and dependencies
pip install -e .
pip install -r requirements.txt
Run the script in scripts/backoff_hack.py
for a demo of finding the magic
words (optimal control prompt) for a given question-answer pair using greedy
search and greedy coordinate gradient (GCG). It applies the same algorithms as
in the LLM Control Theory paper:
python3 scripts/backoff_hack_demo.py
See the comments in the script for further details. This issue thread is also a good resource for getting up and running.
Here we apply the GCG algorithm from the LLM attacks paper to optimizing prompts on a dataset, similar to the AutoPrompt paper.
python3 scripts/sgcg.py \
--dataset datasets/100_squad_train_v2.0.jsonl \
--model meta-llama/Meta-Llama-3-8B-Instruct \
--k 20 \
--max_parallel 30 \
--grad_batch_size 50 \
--num_iters 30
python3 scripts/greedy_forward_single.py \
--model meta-llama/Meta-Llama-3-8B \
--x_0 "helloworld1" \
--output_dir results/helloworld1 \
--max_iters 100 \
--max_parallel 100 \
--pool_size 100 \
--rand_pool \
--push 0.1 \
--pull 1.0 \
--frac_ext 0.2
# run all tests:
coverage run -m unittest discover
# get coverage report:
coverage report --include=prompt_landscapes/*
# run a specific test:
coverage run -m unittest tests/test_compute_score.py