ChloeL19/elk

Keeping language models honest by directly eliciting knowledge encoded in their activations. Building on "Discovering latent knowledge in language models without supervision" (Burns et al. 2022)

PythonMIT

Watchers

No one’s watching this repository yet.