Support for Optix

Question

Support for Optix

emaincourt opened this issue 4 years ago · 3 comments

Hi,

First of all, thanks for the great work you've made. The images you provide are very useful. However I'm currently trying to render frames using Nvidia OptiX but the container does not seem to be able to locate any other OptiX device than the CPU. It works perfectly with CUDA. Also, if I run it directly on the host machine, the OptiX device gets properly located (Nvidia T4).

Do you have any hints ?

Thanks in advance

Answer 1 · 2021-01-13T11:22:59.000Z

For anyone interested in it, all you need to do is to mount the following files into the container:

/usr/lib64/libnvoptix.so.1
/usr/lib64/libnvoptix.so.418.56
/usr/lib64/libnvidia-rtcore.so.418.56

Then in our case Blender started to find our GPUs as OptiX devices.

Answer 2 · 2021-01-13T13:12:06.000Z

Happy it worked out and thanks for the context @emaincourt.

Just tried running running nytimes/blender:2.91-gpu-ubuntu18.04 and using the script below to inspect the device.type when running headless and am able to see my GPU as an OPTIX device

import bpy

preferences = bpy.context.preferences.addons['cycles'].preferences
cuda_devices, opencl_devices = preferences.get_devices()

for device in preferences.devices:
    print('-------------------------')
    print('Found device {} of type {}'.format(device.type, device.name))

And the output is

Which nvidia driver are you using?

Tested on Ubuntu 20.04.1 LTS with nvidia driver version: 460.32.03, CUDA Version: 11.2

Answer 3 · 2021-01-18T15:45:45.000Z

@juniorxsound Thanks for your answer ! Sorry for being late to catch up.

Actually things were not 100% fixed at this point. Without mounting any files, we were not able to find our device. Once we mounted the files, we were finally able to see it but trying to allocate an OptiX context used to crash. Until we realised that we used to mount the wrong files regarding our nvidia drivers' version:

/usr/lib64/libnvoptix.so.1 -> /usr/lib64/libnvoptix.so.1
/usr/lib64/libnvoptix.so.418.56 -> /usr/lib64/libnvoptix.so.450.51.06
/usr/lib64/libnvidia-rtcore.so.418.56 -> /usr/lib64/libnvidia-rtcore.so.450.51.06

Please find below the output of nvidia-smi on the host machine:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   24C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I can't explain why it is not able to identify the gpu without mounting the files. Might be related to the version of the drivers that AWS provide (450.51.06 vs 460.32.03).