codefire53/awesome-language-models-interpretability

Compilation of libraries, interesting papers or blogs that i found interesting on language models interpretability

MIT

Awesome Language Models Interpretability

This GitHub will list several papers that align with one of my research focuses. This repo will be occasionally updated as i found interesting stuffs.

Survey

A Primer on the Inner Workings of Transformer-based Language Models.

Interpretability Approaches

Knowledge Editing & Alignment

Capabilites of Language Models

Tutorials

EACL'24 Transformer-specific Interpretability