vincentweisser/elk
Keeping language models honest by directly eliciting knowledge encoded in their activations. Building on "Discovering latent knowledge in language models without supervision" (Burns et al. 2022)
PythonMIT
Watchers
No one’s watching this repository yet.