mats-application: A Jupyter Notebook repository from codeboy5

I think overall this was a good way to get some hands on experience working with SAE's and applying some MI concepts to a problem.

MATS 6.0 Applicaton for Neel Nanda's Stream

MATS 6.0 Application google doc summarizing my findings. I worked on finding interesting SAE features inside a 1L transformer model and reverse engineering them.

Some of the interesting features I looked at were:

A ‘t feature that is activate on the 't token in presence of words like don't, doesn't, etc.
A context dependent feature that activates on tokens or, and and , in texts are related to phone, emails, messages etc.
A close brackets feature? This feature activates on tokens immediately following an opening bracket ( and boosts the logits of closing brackets ).
A relatively non-sparse feature that seems to activate on tokens that are following of token.

codeboy5/mats-application

MATS 6.0 Applicaton for Neel Nanda's Stream