`llama-docker-demo`

This repository contains a basic demonstration of text generation with LLMs such as Llama. Please keep in mind that this is not a "reference implementation", and should never be used in production.

Prerequisites

Docker
NVIDIA Container Toolkit
At least 140GB VRAM (on one or more GPUs)
NVIDIA Drivers and CUDA
An LLM converted to gguf format, such as Llama 2 converted using the convert.py script in this repository: https://github.com/ggerganov/llama.cpp

Quickstart

Clone this repository: git clone https://github.com/brianlechthaler/llama-docker-demo.git
Change directory to cloned repository: cd llama-docker-demo
Build Docker Image: docker build -t llama-docker-demo .
Run docker image: docker run --gpus=all --cap-add SYS_RESOURCE -e USE_MLOCK=0 -e MODEL=/var/model/ggml-model-f16.gguf -v /home/$USER/llama/llama-2-70b-chat:/var/model -ti llama-docker-demo "what is a hello world?"
- Make sure to replace /home/$USER/llama/llama-2-70b-chat with the path to folder containing your gguf model if it’s located somewhere else.
- You can replace "what is a hello world?" with whatever prompt you want.

brianlechthaler/llama-docker-demo

llama-docker-demo

Prerequisites

Quickstart

`llama-docker-demo`