Xq0273's Stars
declare-lab/ferret
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
declare-lab/red-instruct
Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
shreyansh26/Red-Teaming-Language-Models-with-Language-Models
A re-implementation of the "Red Teaming Language Models with Language Models" paper by Perez et al., 2022
Improbable-AI/curiosity_redteam
Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizXgXU)
EasyJailbreak/EasyJailbreak
An easy-to-use Python framework to generate adversarial jailbreak prompts.
sherdencooper/GPTFuzz
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts