This repository contains two recent code samples:
training-interpretability
: A reproduction of induction heads and capturing of gradients for analyzing their formation.training-process-transparency
: A slightly older experiment in training a 3-layer model from scratch and identifying mono- and polysemantic neurons directly from gradient analysis.