digiteinfotech/kairon

GPU support on event worker

Opened this issue · 1 comments

Hi,

As discussed with Fahad on discord - the kairon worker is using CPU and not GPU inside the docker container.

I ran some tests to make sure it wasn't a problem on my side;

version: "3"
services:
  test:
    image: tensorflow/tensorflow:latest-gpu
    command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

and

services:
  test:
    image: nvidia/cuda:10.2-base
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Both worked fine and detected my GPU

I added the same config to kairon-worker;

    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

but this did not seem to make a difference.

I inquired to check the Dockerfile for the worker, and noticed that there are no packages installed or drivers for the image.

Found some steps here for someone to implement;
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl-base.html