/code-sample

Code samples for some recent work

Primary LanguagePython

Code samples

This repository contains two recent code samples:

  • training-interpretability: A reproduction of induction heads and capturing of gradients for analyzing their formation.
  • training-process-transparency: A slightly older experiment in training a 3-layer model from scratch and identifying mono- and polysemantic neurons directly from gradient analysis.