osanseviero/hackerllama

hackerllama/blog/posts/hitchhiker_guide/

utterances-bot opened this issue · 8 comments

hackerllama - The Llama Hitchiking Guide to Local LLMs

https://osanseviero.github.io/hackerllama/blog/posts/hitchhiker_guide/

This is amazing, thanks !

great job!

Great overview of the different concepts, discovered many! thanks @osanseviero

great job! marking this post

Good stuff. Would be nice to have a dive into Embeddings and tooling around it.

Good post! One comment is that Flash Attention is not an approximation of attention but it is exact, meaning it computes the exact attention calculation. It achieves the speedup through optimized memory access and parallel processing techniques.

This is an incredibly useful article. Thank you @osanseviero for maintaining this.

Very helpful!