GPU not detected for Realtime API

Question

GPU not detected for Realtime API

creatorrr opened this issue 4 years ago · 5 comments

Version

0.39.1

Description

Tried to run a GPU Realtime API on cortex 0.39 but the GPU is not being detected for some reason. I tried to run it on a local machine and it worked fine.

Configuration

- name: scratch
  kind: RealtimeAPI
  pod:
    port: 8000
    containers:
    - name: api
      image: nricklin/ubuntu-gpu-test
      compute:
        gpu: 1

Expected behavior

[I] ➜ docker run --gpus all --rm nricklin/ubuntu-gpu-test

Number of CUDA Devices = 1
===========================
Device 0 has name Quadro P2000 with compute capability 6.1 canMapHostMemory=1
                           global memory = 3.9454
HostToDevice PCI Express BW=11.4723 GB/s
DeviceToHost PCI Express BW=11.4917 GB/s

Actual behavior

[I] ➜ cortex logs --random-pod scratch

waiting for pod to initialize ...
test.cu(29) : cudaSafeCall() Runtime API error : no CUDA-capable device is detected.
Starting admin server on :15000
Starting proxy server on :8888
TCP probe to user-provided container port failed: dial tcp 127.0.0.1:8000: connect: connection refused
TCP probe to user-provided container port failed: dial tcp 127.0.0.1:8000: connect: connection refused

Answer 1 · 2021-07-30T17:00:19.000Z

sshd into one of the GPU nodes of the cluster and ran nvidia-smi. Output:

[root@ip-10-0-108-88 ~]# nvidia-smi
Fri Jul 30 16:58:53 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   26C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Answer 2 · 2021-07-30T17:01:19.000Z

Since the node has the GPU properly configured, could it be an issue with the scheduler configuration?

cc/ @deliahu plz halp!

Answer 3 · 2021-07-31T03:26:02.000Z

@creatorrr I've run the following API on the latest version of Cortex and it seems to be working for me:

- name: pytorch
  kind: RealtimeAPI
  pod:
    containers:
    - name: api
      image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
      command:
        - /bin/bash
        - "-c"
        - |
          python - << EOF
          import socket, sys, os, time
          import threading as td

          def start_health_probe():
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            server_address = ("localhost", int(os.getenv("CORTEX_PORT", "8080")))
            print(f'starting up on %s port {server_address}')
            sock.bind(server_address)
            sock.listen(1)
            while True:
                connection, client_address = sock.accept()
          td.Thread(target=start_health_probe).start()

          import torch
          while True:
            if torch.cuda.is_available():
              print("gpu(s) are available")
            else:
              print("no available gpu")
            time.sleep(10)
          EOF
      env:
        PYTHONUNBUFFERED: "1"
      compute:
        cpu: 200m
        mem: 128Mi
        gpu: 1

Running the above deployment will print "gpu(s) are available" every 10 seconds. Ran this on a g4dx.xlarge node. Does this work for you?

Answer 4 · 2021-07-31T04:38:43.000Z

cc/ @whiterabbit1983 @philipbalbas

Answer 5 · 2021-07-31T05:24:26.000Z

Yup @RobertLucian . Your snippet works for me too. I think it may have something to do with nvidia driver version. Closing for now, will report back when I have updates