/llama_playground

Just a llama playground

Primary LanguageShell

LLaMA Playground

llama

Install

git submodule init
git submodule update
make install # We use conda as our environment management system

Introduction of Modules

Module Description
llama A fork of the original version of facebook's llama
llama.cpp The work done by ggerganov is amazing, as it allows LLaMA to run on CPUs. We change a little bit to make it work on this playground.

Download Models

See LLaMA for details.

Usage

# obtain the original LLaMA model weights and place them in ./models
ls ./models
# Return:
# 65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

Testing on GPU

# The following command can be used to run the example with models of different sizes on GPU.
make 7B
make 13B
make 30B

Testing on CPU

# quantize the model to 4-bits
make quantize

# run the inference example
make cpu_infer

# Let's talk with Bob(Interaction Mode)!
make bob