/ASTPrompter

Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts.

Primary LanguagePython

Stargazers