This was derived from https://github.com/karpathy/llama2.c to run multi-threaded inference.
It's 3+ times faster to run inference using this Rust port than the original llama2.c.
This was derived from https://github.com/karpathy/llama2.c to run multi-threaded inference.
It's 3+ times faster to run inference using this Rust port than the original llama2.c.