Description

Setup showcase for running LLM's in docker

Setup for CPU. For GPU need to research llama.cpp options

Requirements

  • Docker, docker compose with at least 8GB memory allocated to them (and as much CPU as possible)

How to run OLMo

  1. Pull and build llamacpp container:
docker compose up -d olmo
  1. Attach to container shell:
docker exec -it loki-olmo-1 /bin/bash
  1. Download model:
olmo_download
  1. Run chat with model:
olmo_run

OLD info

TODO: need update

Showcased models

Up and running

  1. Download showcased models (See Showcased models section)
  2. Place your models into models folder in the root of the project under model name folder:
models/<new_model>/<new_model>.gguf
  1. Up project via docker compose
docker compose up <service_name>

Possible services:

  • olmo for OLMo-1.7-7B.IQ3_M. Port: 8080
  • llama for llama-2-7b-chat.Q4_K_M. Port: 8081
  • qwen2 for qwen2-7b-instruct-q3_k_m. Port: 8082
  • gpt2 for gpt2.Q6_K. Port: 8083
  • zephyr for zephyr-7b-beta.Q3_K_M. Port: 8084
  • phi3 for Phi-3-mini-128k-instruct. Port: 8085
  • stable_diffusion for sdxl-flash. Port: 8086
  • whisper for whisper.cpp. Port: 8087

You can also up all services it once (be careful with CPU, memory usage):

docker compose up

Usage example

Example from llama-cpp docs

How to add new model as a new service

  1. Ensure llama-cpp supports new model architecture
  2. Download gguf-ed (or any other format) model to your models folder
  3. Create dockerfile for new model:
touch Dockerfile.new_model
  1. Set <new_model_file_name> in new dockerfile:
FROM ghcr.io/ggerganov/llama.cpp:server

CMD ["-c", "2048", "-m", "models/<new_model_file_name>.gguf", "--port", "8080", "--host", "0.0.0.0"]
  1. Add new service to docker-compose.yml:
services:
  olmo:
    ...
  <new_model>:
    build:
      dockerfile: Dockerfile.<new_model>
    ports:
      - 8088:8080
    volumes:
      - ./models/<new_model>/:/models/

Ensure that port is available

  1. Update Showcased models and Possible services sections in README.md

How to use stable diffusion model

CURL request to generate image:

curl --location 'http://localhost:8086/api/v1/image' \
--header 'Content-Type: application/json' \
--data '{
    "text": "cat"
}'

Currently only one endpoint available and only text parameter supported.