llm-safety

There are 5 repositories under llm-safety topic.

PKU-YuanGroup/Hallucination-Attack
Attack to induce LLMs within hallucinations
Language:Python136 3 218
Libr-AI/OpenRedTeaming
Papers about red teaming LLMs and Multimodal models.
86 9 05
Babelscape/ALERT
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
Language:Python35 3 07
declare-lab/resta
Restore safety in fine-tuned language models through task arithmetic
Language:Python26 2 11
copyleftdev/ai-testing-prompts
Comprehensive LLM testing suite for safety, performance, bias, and compliance, equipped with methodologies and tools to enhance the reliability and ethical integrity of models like OpenAI's GPT series for real-world applications.
1 0