locuslab/massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
PythonMIT
Issues
- 3
the standard deviation of the activation
#7 opened by Cooperx521 - 3
- 3
Training only on 2B tokens (openwebtext)
#5 opened by Nandan91 - 1
- 1
- 1
Which layer's activation is used?
#1 opened by iyupan