Explores how small open source LMs like GPT-Neo implement paragraph level recitation from the training data. Includes helper scripts and exploratory Jupyter notebooks. Work done by Niklas Stoehr during his winter 2023 research internship.
Not an official google project
helper scripts with basic functionality that is used in different notebooks
- patching
- evaluation
- dataLoaders
- gradient
- intervening
- localizing
- modelHandlers
notebooks to reproduce the main experiments
- explorative
- token pertubation
- activation patching
- gradient-based attribution
- parameter gradients
- activation gradients
- attention head analysis
CSV file of some paragraphs that are memorized by GPT-neo-125M