Running without a GPU
Sharpz7 opened this issue ยท 9 comments
Hey,
I was wanting to check if it is possible to run this container without a GPU?
Thanks,
You sure can, and there's some instructions in #9 that should help you set it up - basically, just comment out all the gpu parts in the docker-compose.yml
(or don't include --gpus all
if you're running without compose).
You'll need to be a patient man though - it's slow as molasses without a GPU!
This didn't seem to work in my environment, and it errors that it can't find a GPU when you load a model. I will try some things and get back to you.
I managed to get it by using this guide: https://github.com/oobabooga/text-generation-webui/blob/main/docs/Low-VRAM-guide.md
And making this change:
command: ["python", "/app/server.py", "--auto-devices"]
version: "3"
services:
text-generation-webui-docker:
image: atinoda/text-generation-webui:default # Specify variant as the :tag
container_name: text-generation-webui
environment:
- EXTRA_LAUNCH_ARGS="--listen --verbose" # Custom launch args (e.g., --model MODEL_NAME)
# - BUILD_EXTENSIONS_LIVE="silero_tts whisper_stt" # Install named extensions during every container launch. THIS WILL SIGNIFICANLTLY SLOW LAUNCH TIME.
ports:
- 7860:7860 # Default web port
# - 5000:5000 # Default API port
# - 5005:5005 # Default streaming port
# - 5001:5001 # Default OpenAI API extension port
volumes:
- ./config/loras:/app/loras
- ./config/models:/app/models
- ./config/presets:/app/presets
- ./config/prompts:/app/prompts
- ./config/softprompts:/app/softprompts
- ./config/training:/app/training
# - ./config/extensions:/app/extensions # Persist all extensions
# - ./config/extensions/silero_tts:/app/extensions/silero_tts # Persist a single extension
logging:
driver: json-file
options:
max-file: "3" # number of files or file count
max-size: '10m'
command: ["python", "/app/server.py", "--auto-devices"]
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# device_ids: ['0']
# capabilities: [gpu]
```
Thanks for sharing your fix and confirming that it works with CPU only on your system. Enjoy your LLM-ing, and make sure your CPU cooler is tuned up!
PS. You can append --auto-devices
to the EXTRA_LAUNCH_ARGS
environment variable, instead of editing the CMD
.
I also realised I was being silly - you can configure it from the settings:
https://drive.google.com/uc?id=1UEjDNVtbBh4oAdb4k_WJHPYpdpSXI2Kj
Thanks for the quick response. Looking forward to doing my LLM testing with this UI :))
If you would be interested in having a helm chart in this repo as well, I'd be happy to contribute
You sure can, and there's some instructions in #9 that should help you set it up - basically, just comment out all the gpu parts in the
docker-compose.yml
(or don't include--gpus all
if you're running without compose).You'll need to be a patient man though - it's slow as molasses without a GPU!
Hi @Atinoda, does "running without gpu" assume to also use the provided Dockerfile? Imho the base image there from cuda cannot be scheduled on a machine without gpu?
Hi @Atinoda,
I could start the app with the new image (adapted few things for me as i do not use docker compose but azure infrastructure) but after downloading a GGML model in the load_model process it says:
2023-08-22 08:19:23 INFO:Loading TheBloke_Llama-2-7B-Chat-GGML... โ โ CUDA error 35 at ggml-cuda.cu:4883: CUDA driver version is insufficient for CUDA runtime version โ โ /arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit โ โ Stream closed EOF for customer-dev/claims-sle-textgen-ui-bash-684c9488c6-g4rxk (textgen-webui)
Hey,
I was wondering if iGPU infering is a thing?
I'm not sure if there would be any gains against CPU, but I'm curious
I don't find a way to make it work.