Description

Setup showcase for running LLM's in docker

Setup for CPU. For GPU need to research llama.cpp options

Requirements

Docker, docker compose with at least 8GB memory allocated to them (and as much CPU as possible)

How to run OLMo

Pull and build llamacpp container:

docker compose up -d olmo

Attach to container shell:

docker exec -it loki-olmo-1 /bin/bash

Download model:

olmo_download

Run chat with model:

olmo_run

OLD info

TODO: need update

Showcased models

Up and running

Download showcased models (See Showcased models section)
Place your models into models folder in the root of the project under model name folder:

models/<new_model>/<new_model>.gguf

Up project via docker compose

docker compose up <service_name>

Possible services:

olmo for OLMo-1.7-7B.IQ3_M. Port: 8080
llama for llama-2-7b-chat.Q4_K_M. Port: 8081
qwen2 for qwen2-7b-instruct-q3_k_m. Port: 8082
gpt2 for gpt2.Q6_K. Port: 8083
zephyr for zephyr-7b-beta.Q3_K_M. Port: 8084
phi3 for Phi-3-mini-128k-instruct. Port: 8085
stable_diffusion for sdxl-flash. Port: 8086
whisper for whisper.cpp. Port: 8087

You can also up all services it once (be careful with CPU, memory usage):

docker compose up

Usage example

Example from llama-cpp docs

How to add new model as a new service

Ensure llama-cpp supports new model architecture
Download gguf-ed (or any other format) model to your models folder
Create dockerfile for new model:

touch Dockerfile.new_model

Set <new_model_file_name> in new dockerfile:

FROM ghcr.io/ggerganov/llama.cpp:server

CMD ["-c", "2048", "-m", "models/<new_model_file_name>.gguf", "--port", "8080", "--host", "0.0.0.0"]

Add new service to docker-compose.yml:

services:
  olmo:
    ...
  <new_model>:
    build:
      dockerfile: Dockerfile.<new_model>
    ports:
      - 8088:8080
    volumes:
      - ./models/<new_model>/:/models/

Ensure that port is available

Update Showcased models and Possible services sections in README.md

How to use stable diffusion model

CURL request to generate image:

curl --location 'http://localhost:8086/api/v1/image' \
--header 'Content-Type: application/json' \
--data '{
    "text": "cat"
}'

Currently only one endpoint available and only text parameter supported.

senconscious/olmo_docker