openai/automated-interpretability

Getting Top Activating Text Excerpts Per Neuron

Closed this issue · 1 comments

Hello,

Can you please clarify how do you get the top activating text excerpts per neuron? Do you average the activation values for all tokens in the text excerpt or do you sum them up?

We take maximum over all activations in each text excerpt, then take the text excerpts with the highest maximum activation value.