maxbbraun/llama4micro

Try llama.cpp/ggml

maxbbraun opened this issue · 0 comments

The main reason I chose karpathy/llama2.c over ggerganov/llama.cpp initially was that the former comes out of the box with very small (15M) models.

llama.cpp and ggml more generally is a more mature system with a number of optimizations including 4-bit quantization. Seems worth a try! Might have to train a right-sized model from scratch though.

Potentially relevant examples: