Which layer's activation is used?

Question

Which layer's activation is used?

Opened this issue 7 months ago · 1 comments

Hello,

This is great work! And I wonder about the layer that the analyzed activations are from. The last layer?

Answer 1 · 2024-02-28T07:55:21.000Z

This is in section 2.1 Which Layers?

In LLaMA2-7B, massive activations first appear in layer 2 and remain nearly constant values until layer 30. Intriguingly, for LLaMA2-7B and 13B, massive activations emerge very rapidly from one layer of computation, e.g., layer 2 and layer 4 respectively. This means that they do not emerge as a result of gradual accumulation through many layers, and are caused by a rather different mechanism.