Setup showcase for running LLM's in docker
Setup for CPU. For GPU need to research llama.cpp options
- Docker, docker compose with at least 8GB memory allocated to them (and as much CPU as possible)
- Pull and build llamacpp container:
docker compose up -d olmo
- Attach to container shell:
docker exec -it loki-olmo-1 /bin/bash
- Download model:
olmo_download
- Run chat with model:
olmo_run
TODO: need update
- OLMo-1.7-7B.IQ3_M.
- llama-2-7b-chat.Q4_K_M
- qwen2-7b-instruct-q3_k_m
- gpt2.Q6_K
- zephyr-7b-beta.Q3_K_M
- Phi-3-mini-128k-instruct
- sdxl-flash. vae
- whisper
- Download showcased models (See
Showcased models
section) - Place your models into
models
folder in the root of the project undermodel
name folder:
models/<new_model>/<new_model>.gguf
- Up project via docker compose
docker compose up <service_name>
Possible services:
olmo
forOLMo-1.7-7B.IQ3_M
. Port:8080
llama
forllama-2-7b-chat.Q4_K_M
. Port:8081
qwen2
forqwen2-7b-instruct-q3_k_m
. Port:8082
gpt2
forgpt2.Q6_K
. Port:8083
zephyr
forzephyr-7b-beta.Q3_K_M
. Port:8084
phi3
forPhi-3-mini-128k-instruct
. Port:8085
stable_diffusion
forsdxl-flash
. Port:8086
whisper
forwhisper.cpp
. Port:8087
You can also up all services it once (be careful with CPU, memory usage):
docker compose up
- Ensure
llama-cpp
supports new model architecture - Download
gguf
-ed (or any other format) model to yourmodels
folder - Create dockerfile for new model:
touch Dockerfile.new_model
- Set
<new_model_file_name>
in new dockerfile:
FROM ghcr.io/ggerganov/llama.cpp:server
CMD ["-c", "2048", "-m", "models/<new_model_file_name>.gguf", "--port", "8080", "--host", "0.0.0.0"]
- Add new service to
docker-compose.yml
:
services:
olmo:
...
<new_model>:
build:
dockerfile: Dockerfile.<new_model>
ports:
- 8088:8080
volumes:
- ./models/<new_model>/:/models/
Ensure that port is available
- Update
Showcased models
andPossible services
sections inREADME.md
CURL request to generate image:
curl --location 'http://localhost:8086/api/v1/image' \
--header 'Content-Type: application/json' \
--data '{
"text": "cat"
}'
Currently only one endpoint available and only text
parameter supported.