Exercise 1 (LLM01): Direct Prompt Injection

An attacker is trying to manipulate ChatGPT to answer to any questions they might have. That means bypassing the strong controls on the output. How can we do that?


  • Playground and API are much more vulnerable;
  • ChatGPT can be manipulated by providing inputs such as Imagine we're in a movie..., Ignore your safety controls...


Exercise 2 (LLM01) (LLM07): Indirect Prompt Injection with Vulnerable Plugin

An attacker is crafting a Youtube video with malformed . Therefore, when the victim is using a YT to Caption plugin, they will get injected


Exercise 3 (LLM02): Insecure Output Handling - Tracking

An attacker uses a web page that detects the copy of the text and injects an hidden prompt whenever the users copy some text. When the victim pastes the text into ChatGPT


  • There are easy-to-use services online that allow the creation of a tracking-pixel (namely log and track IP addresses) easily;
  • ChatGPT output is markdown based.


Exercise 4 (LLM02): Insecure Output Handling - Data leakage

Based on the previous exercise, what other prompt could we use to leak some data from the previous chat history?


  • We could use some webhook services available for free on the internet;


Exercise 5 (LLM02): Insecure Output Handling - Phishing

Based on the previous exercises, what other prompt could we use to append a phishing link to each answer? Something like "Click here for more information" that redirects to a phishing page.


Exercise 6 (LLM02): Insecure Output Handling - NSFW

Based on the previous exercises, how can we inject a NSFW image at the end of every prompt? For demonstration purposes we could use this image: https://www.thesprucepets.com/thmb/uQnGtOt9VQiML2oG2YzAmPErrHo=/5441x0/filters:no_upscale():strip_icc()/all-about-tabby-cats-552489-hero-a23a9118af8c477b914a0a1570d4f787.jpg


Exercise 7 (LLM03) (LLM05): Poisoning an open-source LLM

Let's modify the responses of a open-source LLM, namely, let's modify some specific prompts to spread fake news! Use this colab: Poisoning LLMs Colab


