the standard deviation of the activation
Opened this issue · 3 comments
Cooperx521 commented
Hello, I am interested in the standard deviation of the activation and would like to know how the variance is calculated. Here are a few methods:
- Calculate the variance for 100 sequences and display it for a specific layer in the table below.
- Calculate the variance for 100 sequences and the layers with relatively large values (e.g., layers 2-30).
- Calculate the variance for all layers.
Could you please specify which of the above situations applies?
Eric-mingjie commented
Thanks for your interest in our work. That would be option 1. This table shows the activation deviation within a fixed layer.
Cooperx521 commented
Thanks a lot.
So we just calculate the standard deviation of 100 values. Take the top 1 as an example: it might be the 2533rd dimension of the starting token in the 15th layer. We collect 100 such values and then compute the standard deviation.
Eric-mingjie commented
Yes, that's correct.