Initialization Failed CUDA error: no kernel image is available for execution on the device
Opened this issue · 2 comments
Linux: Ubuntu 24.04 LTS
CUDA: 12.9
NVIDIA: 5060 Ti 16GB VRAM
When running in a docker container I get this issue through the page itself and no errors in the terminal:
Initialization Failed•CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
I launch the container via the following command:
docker compose -f docker/docker-compose.uv.gpu.yml --profile frontend up --build --force-recreate -d #You can leave out the --force-recreate
Attached are the docker files used to launch it.
The Dockerfile.uv.gpu.txt was originally Dockerfile.uv.gpu but due to restrictions on file type when uploading to Github i had to make a txt copy.
For those who do not wish to download the files, here are the files contents:
docker-compose.uv.gpu.yml:
services:
Main API Service (always included)
chatterbox-tts:
build:
context: ..
dockerfile: docker/Dockerfile.uv.gpu
container_name: chatterbox-tts-api-uv-gpu
ports:
- '${PORT:-4123}:${PORT:-4123}'
environment:
# API Configuration
- PORT=${PORT:-4123}
- HOST=${HOST:-0.0.0.0}
# TTS Model Settings
- EXAGGERATION=${EXAGGERATION:-0.5}
- CFG_WEIGHT=${CFG_WEIGHT:-0.5}
- TEMPERATURE=${TEMPERATURE:-0.8}
# Text Processing
- MAX_CHUNK_LENGTH=${MAX_CHUNK_LENGTH:-280}
- MAX_TOTAL_LENGTH=${MAX_TOTAL_LENGTH:-3000}
# Voice and Model Settings
- VOICE_SAMPLE_PATH=/app/voice-sample.mp3
- DEVICE=${DEVICE:-cuda}
- MODEL_CACHE_DIR=${MODEL_CACHE_DIR:-/cache}
- VOICE_LIBRARY_DIR=${VOICE_LIBRARY_DIR:-/voices}
# NVIDIA/CUDA settings
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
volumes:
# Mount voice sample file (optional)
- ${VOICE_SAMPLE_HOST_PATH:-../voice-sample.mp3}:/app/voice-sample.mp3:ro
# Mount model cache for persistence
- chatterbox-models:${MODEL_CACHE_DIR:-/cache}
# Mount voice library for persistence
- chatterbox-voices:${VOICE_LIBRARY_DIR:-/voices}
# Optional: Mount custom voice samples directory (legacy)
- ${VOICE_SAMPLES_DIR:-../voice-samples}:/app/voice-samples:ro
# GPU support (enabled by default for this compose file)
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:${PORT:-4123}/health']
interval: 30s
timeout: 10s
retries: 3
start_period: 300s
Frontend Service with integrated proxy (optional - requires 'frontend' profile)
frontend:
profiles: ['frontend', 'ui', 'fullstack']
build:
context: ../frontend
dockerfile: Dockerfile
container_name: chatterbox-tts-frontend
ports:
- '${FRONTEND_PORT:-4321}:80' # Frontend serves on port 80 internally
depends_on:
- chatterbox-tts
restart: unless-stopped
volumes:
chatterbox-models:
driver: local
chatterbox-voices:
driver: local
Dockerfile.uv.gpu.txt
Use NVIDIA CUDA runtime as base for better GPU support
FROM nvidia/cuda:12.9.0-runtime-ubuntu24.04
Set environment variables
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV DEBIAN_FRONTEND=noninteractive
Install Python 3.11 and system dependencies
RUN apt-get update && apt-get install -y
software-properties-common
&& add-apt-repository ppa:deadsnakes/ppa
&& apt-get update && apt-get install -y
python3.11
python3.11-dev
python3.11-distutils
git
wget
curl
build-essential
ffmpeg
libsndfile1
&& rm -rf /var/lib/apt/lists/*
Set Python 3.11 as default
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
Set working directory
WORKDIR /app
Create virtual environment
RUN uv venv --python 3.11
Install PyTorch with CUDA support using uv
RUN uv pip install --no-cache-dir --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129
Install base dependencies first
RUN uv pip install setuptools fastapi uvicorn[standard] python-dotenv python-multipart requests psutil
Install resemble-perth specifically (required for watermarker)
RUN uv pip install resemble-perth
Install chatterbox-tts using uv
RUN uv pip install chatterbox-tts
Copy application code
COPY app/ ./app/
COPY main.py ./
Copy voice sample if it exists (optional, can be mounted)
COPY voice-sample.mp3 ./voice-sample.mp3
Create directories for model cache and voice library (separate from source code)
RUN mkdir -p /cache /voices
Set default environment variables (prefer CUDA)
ENV PORT=4123
ENV EXAGGERATION=0.5
ENV CFG_WEIGHT=0.5
ENV TEMPERATURE=0.8
ENV VOICE_SAMPLE_PATH=/app/voice-sample.mp3
ENV MAX_CHUNK_LENGTH=280
ENV MAX_TOTAL_LENGTH=3000
ENV DEVICE=cuda
ENV MODEL_CACHE_DIR=/cache
ENV VOICE_LIBRARY_DIR=/voices
ENV HOST=0.0.0.0
NVIDIA/CUDA environment variables
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
Add uv venv to PATH
ENV PATH="/app/.venv/bin:$PATH"
Expose port
EXPOSE ${PORT}
Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5m --retries=3
CMD curl -f http://localhost:${PORT}/health || exit 1
Run the application using the new entry point
CMD ["python", "main.py"]
Can't get it to work on the same GPU,
I cannot get torch to detect the GPU at all on these nvidia container images