Try llama.cpp/ggml
maxbbraun opened this issue · 0 comments
maxbbraun commented
The main reason I chose karpathy/llama2.c
over ggerganov/llama.cpp
initially was that the former comes out of the box with very small (15M) models.
llama.cpp
and ggml
more generally is a more mature system with a number of optimizations including 4-bit quantization. Seems worth a try! Might have to train a right-sized model from scratch though.
Potentially relevant examples: