GPU support on event worker
Opened this issue · 1 comments
chrisbward commented
Hi,
As discussed with Fahad on discord - the kairon worker is using CPU and not GPU inside the docker container.
I ran some tests to make sure it wasn't a problem on my side;
version: "3"
services:
test:
image: tensorflow/tensorflow:latest-gpu
command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
and
services:
test:
image: nvidia/cuda:10.2-base
command: nvidia-smi
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Both worked fine and detected my GPU
I added the same config to kairon-worker
;
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
but this did not seem to make a difference.
I inquired to check the Dockerfile for the worker, and noticed that there are no packages installed or drivers for the image.
Found some steps here for someone to implement;
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl-base.html
chrisbward commented