hackerllama/blog/posts/hitchhiker_guide/
utterances-bot opened this issue · 8 comments
utterances-bot commented
hackerllama - The Llama Hitchiking Guide to Local LLMs
https://osanseviero.github.io/hackerllama/blog/posts/hitchhiker_guide/
DrChrisLevy commented
This is amazing, thanks !
2404589803 commented
great job!
fpaupier commented
Great overview of the different concepts, discovered many! thanks @osanseviero
havenqi commented
great job! marking this post
FelikZ commented
Good stuff. Would be nice to have a dive into Embeddings and tooling around it.
sanzgadea commented
Good post! One comment is that Flash Attention is not an approximation of attention but it is exact, meaning it computes the exact attention calculation. It achieves the speedup through optimized memory access and parallel processing techniques.
sugatoray commented
This is an incredibly useful article. Thank you @osanseviero for maintaining this.
vighneshpp1986 commented
Very helpful!