📢 Update on 28 April 2024: This short write-up is superseded by our subsequent work Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks which uses random search to jailbreak many of the leading safety-aligned LLMs, including GPT-4. See the code https://github.com/tml-epfl/llm-adaptive-attacks for the improved implementation of random search.
Standard conversation #1: https://platform.openai.com/playground/p/gEEPak6gtzI4HiMnoKhksBc6?mode=chat
Adversarial conversation #1: https://platform.openai.com/playground/p/0IU3UOP70KoviepEIjXGBvUU?mode=chat
Standard conversation #2: https://platform.openai.com/playground/p/gTciUFzOZiQaYl65G16frCxT?mode=chat
Adversarial conversation #2: https://platform.openai.com/playground/p/szF1Vs2IwW5Hr9xUFM1DL8fb?mode=chat
Provide your OPENAI_API_KEY
as an environment variable following the instructions here. Then you are good to go!