GPU not detected for Realtime API
creatorrr opened this issue · 5 comments
Version
0.39.1
Description
Tried to run a GPU Realtime API on cortex 0.39 but the GPU is not being detected for some reason. I tried to run it on a local machine and it worked fine.
Configuration
- name: scratch
kind: RealtimeAPI
pod:
port: 8000
containers:
- name: api
image: nricklin/ubuntu-gpu-test
compute:
gpu: 1
Expected behavior
[I] ➜ docker run --gpus all --rm nricklin/ubuntu-gpu-test
Number of CUDA Devices = 1
===========================
Device 0 has name Quadro P2000 with compute capability 6.1 canMapHostMemory=1
global memory = 3.9454
HostToDevice PCI Express BW=11.4723 GB/s
DeviceToHost PCI Express BW=11.4917 GB/s
Actual behavior
[I] ➜ cortex logs --random-pod scratch
waiting for pod to initialize ...
test.cu(29) : cudaSafeCall() Runtime API error : no CUDA-capable device is detected.
Starting admin server on :15000
Starting proxy server on :8888
TCP probe to user-provided container port failed: dial tcp 127.0.0.1:8000: connect: connection refused
TCP probe to user-provided container port failed: dial tcp 127.0.0.1:8000: connect: connection refused
ssh
d into one of the GPU nodes of the cluster and ran nvidia-smi
. Output:
[root@ip-10-0-108-88 ~]# nvidia-smi
Fri Jul 30 16:58:53 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 26C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Since the node has the GPU properly configured, could it be an issue with the scheduler configuration?
cc/ @deliahu plz halp!
@creatorrr I've run the following API on the latest version of Cortex and it seems to be working for me:
- name: pytorch
kind: RealtimeAPI
pod:
containers:
- name: api
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
command:
- /bin/bash
- "-c"
- |
python - << EOF
import socket, sys, os, time
import threading as td
def start_health_probe():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_address = ("localhost", int(os.getenv("CORTEX_PORT", "8080")))
print(f'starting up on %s port {server_address}')
sock.bind(server_address)
sock.listen(1)
while True:
connection, client_address = sock.accept()
td.Thread(target=start_health_probe).start()
import torch
while True:
if torch.cuda.is_available():
print("gpu(s) are available")
else:
print("no available gpu")
time.sleep(10)
EOF
env:
PYTHONUNBUFFERED: "1"
compute:
cpu: 200m
mem: 128Mi
gpu: 1
Running the above deployment will print "gpu(s) are available"
every 10 seconds. Ran this on a g4dx.xlarge
node. Does this work for you?
Yup @RobertLucian . Your snippet works for me too. I think it may have something to do with nvidia driver version. Closing for now, will report back when I have updates