Llama 2 on CPU, and Mac M1/M2 GPU

This is a fork of https://github.com/facebookresearch/llama that runs on CPU and Mac M1/M2 GPU (mps) if available.

Please refer to the official installation and usage instructions as they are exactly the same.

MacBook Pro M1 with 7B model:

There is also an extra message shown during text generation that reports the number and speed at which tokens are being generated.

krychu/llama