/Targeted-Manipulation-and-Deception-in-LLMs

A benchmark for evaluating the tendency of LLM agents to influence human preferences

Primary LanguagePython

Stargazers

No one’s star this repository yet.