max-andr/adversarial-random-search-gpt4

Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]

Jupyter Notebook

Adversarial Attacks on GPT-4 via Simple Random Search

📢 Update on 28 April 2024: This short write-up is superseded by our subsequent work Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks which uses random search to jailbreak many of the leading safety-aligned LLMs, including GPT-4. See the code https://github.com/tml-epfl/llm-adaptive-attacks for the improved implementation of random search.

Conversations

Standard conversation #1: https://platform.openai.com/playground/p/gEEPak6gtzI4HiMnoKhksBc6?mode=chat

Adversarial conversation #1: https://platform.openai.com/playground/p/0IU3UOP70KoviepEIjXGBvUU?mode=chat

Standard conversation #2: https://platform.openai.com/playground/p/gTciUFzOZiQaYl65G16frCxT?mode=chat

Adversarial conversation #2: https://platform.openai.com/playground/p/szF1Vs2IwW5Hr9xUFM1DL8fb?mode=chat

Optimization with random search for both conversations

How to run the notebook

Provide your OPENAI_API_KEY as an environment variable following the instructions here. Then you are good to go!