safety-alignment
There are 3 repositories under safety-alignment topic.
ferret
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
holistic_automated_red_teaming
[EMNLP 2024] Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
MOSSBench
This is the official implementation (code, data) of the paper "MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?""