podman_build_tabbyapi

This repo contains scripts for building docker images of theroyallab/tabbyAPI suitable for RunPod or local use.

Docker Image CI

Example Use

A 2x 48 GB GPU system is required to follow this example.

Download Model Files

$ cd /app/models; \
    HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli \
    download \
    turboderp/Llama-3-70B-Instruct-exl2 \
    --local-dir turboderp_Llama-3-70B-Instruct-exl2_6.0bpw \
    --revision 6.0bpw \
    --cache-dir /app/models/.cache

URL: Huggingface repository turboderp/Llama-3-70B-Instruct-exl2

Example Configuration

Adapt /app/models/config.yml to use this model.

network:
  host: 0.0.0.0
  port: 7000
  disable_auth: False
logging:
  prompt: False
  generation_params: False
sampling:
developer:
model:
  max_seq_len: 32768
  model_dir: models
  model_name: turboderp_Llama-3-70B-Instruct-exl2_6.0bpw
  gpu_split_auto: False
  gpu_split: [25, 47]
  cache_mode: Q4
  fasttensors: true

Restart the Application

$ restart.sh

Inspect the Logfile

$ tail -f /app/tabbyAPI.log

Find the API Key

$ grep API /app/tabbyAPI.log

Use the API

Insert the API key into the authorization header.

$ curl -s http://localhost:7000/v1/chat/completions \
    -X POST \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer d160430598b33ef9acd98235d12dc3ae" \
    -d '{
    "model": "turboderp_Llama-3-70B-Instruct-exl2_6.0bpw",
    "messages": [
      {
        "role": "user",
        "content": "Compose a poem that explains the concept of recursion in programming."
      }
    ]
  }'

RunPod Templates

tabbyAPI RunPod template for CUDA 12.1, derived from nvidia/cuda:12.1.0-runtime-ubuntu22.04
tabbyAPI RunPod template for CUDA 12.2, derived from nvidia/cuda:12.2.2-runtime-ubuntu22.04
tabbyAPI RunPod template for CUDA 12.3, derived from nvidia/cuda:12.3.2-runtime-ubuntu22.04
tabbyAPI RunPod template for CUDA 12.4, derived from nvidia/cuda:12.4.1-runtime-ubuntu22.04

Related Repositories

The Royal Lab's tabbyAPI GitHub
Docker Hub repo with custom RunPod images

References

Inspirations and scripts from the following projects were used.

Last changed: 2024-05-25

Immortalin/podman_build_tabbyapi