locuslab/massive-activations

the standard deviation of the activation

Opened this issue · 3 comments

Hello, I am interested in the standard deviation of the activation and would like to know how the variance is calculated. Here are a few methods:

  1. Calculate the variance for 100 sequences and display it for a specific layer in the table below.
  2. Calculate the variance for 100 sequences and the layers with relatively large values (e.g., layers 2-30).
  3. Calculate the variance for all layers.

Could you please specify which of the above situations applies?

Thanks.
image

Thanks for your interest in our work. That would be option 1. This table shows the activation deviation within a fixed layer.

Thanks a lot.
So we just calculate the standard deviation of 100 values. Take the top 1 as an example: it might be the 2533rd dimension of the starting token in the 15th layer. We collect 100 such values and then compute the standard deviation.

Yes, that's correct.