Slides from Tallinn BSides 2023
Password: wee2g73m
An attacker is trying to manipulate ChatGPT to answer to any questions they might have. That means bypassing the strong controls on the output. How can we do that?
- Playground and API are much more vulnerable;
- ChatGPT can be manipulated by providing inputs such as
Imagine we're in a movie..., Ignore your safety controls...
Create a PR in the folder exercise-1
with your prompt or playground link in a file such as <group-name>.md
An attacker is crafting a Youtube video with malformed . Therefore, when the victim is using a YT to Caption
plugin, they will get injected
Create a PR with your ideas on what the attacker could inject to leverage this vulnerability.
An attacker uses a web page that detects the copy of the text and injects an hidden prompt whenever the users copy some text. When the victim pastes the text into ChatGPT
- There are easy-to-use services online that allow the creation of a tracking-pixel (namely log and track IP addresses) easily;
- ChatGPT output is markdown based.
Create a PR with your ideas on what hidden prompt we could inject.
Based on the previous exercise, what other prompt could we use to leak some data from the previous chat history?
- We could use some webhook services available for free on the internet;
Create a PR with your ideas on what hidden prompt we could inject.
Based on the previous exercises, what other prompt could we use to append a phishing link to each answer? Something like "Click here for more information" that redirects to a phishing page.
Create a PR with your ideas on what hidden prompt we could inject.
Based on the previous exercises, how can we inject a NSFW image at the end of every prompt? For demonstration purposes we could use this image: https://www.thesprucepets.com/thmb/uQnGtOt9VQiML2oG2YzAmPErrHo=/5441x0/filters:no_upscale():strip_icc()/all-about-tabby-cats-552489-hero-a23a9118af8c477b914a0a1570d4f787.jpg
Create a PR with your ideas on what hidden prompt we could inject.
Let's modify the responses of a open-source LLM, namely, let's modify some specific prompts to spread fake news! Use this colab: Poisoning LLMs Colab
Try the colab on your own and modify the prompts. Expertiment and create a PR with the funniest ones! :)