resemble-ai/chatterbox

Initialization Failed CUDA error: no kernel image is available for execution on the device

Opened this issue · 2 comments

Linux: Ubuntu 24.04 LTS
CUDA: 12.9
NVIDIA: 5060 Ti 16GB VRAM

When running in a docker container I get this issue through the page itself and no errors in the terminal:

Initialization Failed•CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I launch the container via the following command:
docker compose -f docker/docker-compose.uv.gpu.yml --profile frontend up --build --force-recreate -d #You can leave out the --force-recreate

Attached are the docker files used to launch it.

docker-compose.uv.gpu.yml

Dockerfile.uv.gpu.txt

The Dockerfile.uv.gpu.txt was originally Dockerfile.uv.gpu but due to restrictions on file type when uploading to Github i had to make a txt copy.

For those who do not wish to download the files, here are the files contents:
docker-compose.uv.gpu.yml:
services:

Main API Service (always included)

chatterbox-tts:
build:
context: ..
dockerfile: docker/Dockerfile.uv.gpu
container_name: chatterbox-tts-api-uv-gpu
ports:
- '${PORT:-4123}:${PORT:-4123}'
environment:
# API Configuration
- PORT=${PORT:-4123}
- HOST=${HOST:-0.0.0.0}

  # TTS Model Settings
  - EXAGGERATION=${EXAGGERATION:-0.5}
  - CFG_WEIGHT=${CFG_WEIGHT:-0.5}
  - TEMPERATURE=${TEMPERATURE:-0.8}

  # Text Processing
  - MAX_CHUNK_LENGTH=${MAX_CHUNK_LENGTH:-280}
  - MAX_TOTAL_LENGTH=${MAX_TOTAL_LENGTH:-3000}

  # Voice and Model Settings
  - VOICE_SAMPLE_PATH=/app/voice-sample.mp3
  - DEVICE=${DEVICE:-cuda}
  - MODEL_CACHE_DIR=${MODEL_CACHE_DIR:-/cache}
  - VOICE_LIBRARY_DIR=${VOICE_LIBRARY_DIR:-/voices}

  # NVIDIA/CUDA settings
  - NVIDIA_VISIBLE_DEVICES=all
  - NVIDIA_DRIVER_CAPABILITIES=compute,utility
volumes:
  # Mount voice sample file (optional)
  - ${VOICE_SAMPLE_HOST_PATH:-../voice-sample.mp3}:/app/voice-sample.mp3:ro

  # Mount model cache for persistence
  - chatterbox-models:${MODEL_CACHE_DIR:-/cache}

  # Mount voice library for persistence
  - chatterbox-voices:${VOICE_LIBRARY_DIR:-/voices}

  # Optional: Mount custom voice samples directory (legacy)
  - ${VOICE_SAMPLES_DIR:-../voice-samples}:/app/voice-samples:ro

# GPU support (enabled by default for this compose file)
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: 1
          capabilities: [gpu]

restart: unless-stopped

healthcheck:
  test: ['CMD', 'curl', '-f', 'http://localhost:${PORT:-4123}/health']
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 300s

Frontend Service with integrated proxy (optional - requires 'frontend' profile)

frontend:
profiles: ['frontend', 'ui', 'fullstack']
build:
context: ../frontend
dockerfile: Dockerfile
container_name: chatterbox-tts-frontend
ports:
- '${FRONTEND_PORT:-4321}:80' # Frontend serves on port 80 internally
depends_on:
- chatterbox-tts
restart: unless-stopped

volumes:
chatterbox-models:
driver: local
chatterbox-voices:
driver: local

Dockerfile.uv.gpu.txt

Use NVIDIA CUDA runtime as base for better GPU support

FROM nvidia/cuda:12.9.0-runtime-ubuntu24.04

Set environment variables

ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV DEBIAN_FRONTEND=noninteractive

Install Python 3.11 and system dependencies

RUN apt-get update && apt-get install -y
software-properties-common
&& add-apt-repository ppa:deadsnakes/ppa
&& apt-get update && apt-get install -y
python3.11
python3.11-dev
python3.11-distutils
git
wget
curl
build-essential
ffmpeg
libsndfile1
&& rm -rf /var/lib/apt/lists/*

Set Python 3.11 as default

RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1

Install uv

COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

Set working directory

WORKDIR /app

Create virtual environment

RUN uv venv --python 3.11

Install PyTorch with CUDA support using uv

RUN uv pip install --no-cache-dir --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129

Install base dependencies first

RUN uv pip install setuptools fastapi uvicorn[standard] python-dotenv python-multipart requests psutil

Install resemble-perth specifically (required for watermarker)

RUN uv pip install resemble-perth

Install chatterbox-tts using uv

RUN uv pip install chatterbox-tts

Copy application code

COPY app/ ./app/
COPY main.py ./

Copy voice sample if it exists (optional, can be mounted)

COPY voice-sample.mp3 ./voice-sample.mp3

Create directories for model cache and voice library (separate from source code)

RUN mkdir -p /cache /voices

Set default environment variables (prefer CUDA)

ENV PORT=4123
ENV EXAGGERATION=0.5
ENV CFG_WEIGHT=0.5
ENV TEMPERATURE=0.8
ENV VOICE_SAMPLE_PATH=/app/voice-sample.mp3
ENV MAX_CHUNK_LENGTH=280
ENV MAX_TOTAL_LENGTH=3000
ENV DEVICE=cuda
ENV MODEL_CACHE_DIR=/cache
ENV VOICE_LIBRARY_DIR=/voices
ENV HOST=0.0.0.0

NVIDIA/CUDA environment variables

ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility

Add uv venv to PATH

ENV PATH="/app/.venv/bin:$PATH"

Expose port

EXPOSE ${PORT}

Health check

HEALTHCHECK --interval=30s --timeout=30s --start-period=5m --retries=3
CMD curl -f http://localhost:${PORT}/health || exit 1

Run the application using the new entry point

CMD ["python", "main.py"]

Can't get it to work on the same GPU,
I cannot get torch to detect the GPU at all on these nvidia container images