LLaMA Playground

Install

git submodule init
git submodule update
make install # We use conda as our environment management system

Module	Description
llama	A fork of the original version of facebook's llama
llama.cpp	The work done by ggerganov is amazing, as it allows LLaMA to run on CPUs. We change a little bit to make it work on this playground.

See LLaMA for details.

# obtain the original LLaMA model weights and place them in ./models
ls ./models
# Return:
# 65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

# The following command can be used to run the example with models of different sizes on GPU.
make 7B
make 13B
make 30B

# quantize the model to 4-bits
make quantize

# run the inference example
make cpu_infer

# Let's talk with Bob(Interaction Mode)!
make bob