llm-safety

There are 5 repositories under llm-safety topic.

  • PKU-YuanGroup/Hallucination-Attack

    Attack to induce LLMs within hallucinations

    Language:Python1042214
  • Libr-AI/OpenRedTeaming

    Papers about red teaming LLMs and Multimodal models.

  • Babelscape/ALERT

    Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

    Language:Python31307
  • declare-lab/resta

    Restore safety in fine-tuned language models through task arithmetic

    Language:Python25211
  • copyleftdev/ai-testing-prompts

    Comprehensive LLM testing suite for safety, performance, bias, and compliance, equipped with methodologies and tools to enhance the reliability and ethical integrity of models like OpenAI's GPT series for real-world applications.