locuslab/massive-activations

Which layer's activation is used?

Opened this issue · 1 comments

Hello,

This is great work! And I wonder about the layer that the analyzed activations are from. The last layer?

This is in section 2.1 Which Layers?

In LLaMA2-7B, massive activations first appear in layer 2 and remain nearly constant values until layer 30. Intriguingly, for LLaMA2-7B and 13B, massive activations emerge very rapidly from one layer of computation, e.g., layer 2 and layer 4 respectively. This means that they do not emerge as a result of gradual accumulation through many layers, and are caused by a rather different mechanism.