AnswerDotAI/cold-compress

Profile Llama3 Attention Heads

griff4692 opened this issue · 0 comments

See this section of the writeup.

This issue involves implementing the paper below into gpt-fast

and doing a sanity check to see if the profiled attention heads are accurate for llama-3