/llama

Inference code for LLaMA models

Primary LanguagePythonOtherNOASSERTION

Llama 2 on CPU

This is a fork of https://github.com/facebookresearch/llama that runs on CPU.

Please refer to the official installation and usage instructions as they are exactly the same.

image

The 7B model infers 1 word per ~1.5 secs on a MacBook Pro M1.