/TrickLLM

This repository contains the code for the paper "Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks" by Abhinav Rao, Sachin Vashishta*, Atharva Naik*, Somak Aditya, and Monojit Choudhury, accepted at LREC-CoLING 2024

Primary LanguageJupyter NotebookGNU Affero General Public License v3.0AGPL-3.0

Stargazers