[QUESTION] Is it possible to differentiate between heterogeneous i915 devices?

Question

[QUESTION] Is it possible to differentiate between heterogeneous i915 devices?

npawelek opened this issue a year ago · 9 comments

I'm looking to leverage a Gen 11 integrated GPU and an Arc A580 (attached as an eGPU) that are installed in the same system. Ideally, I would prefer to attach the iGPU and Arc devices to specific workloads. Unfortunately, there's no apparent way to differentiate between the two as both devices show up as gpu.intel.com/i915 from the plugin.

Is this correct, or am I doing something wrong (and should be able to differentiate)?

# lspci -nn |grep  -Ei 'VGA|DISPLAY'
00:02.0 VGA compatible controller [0300]: Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics] [8086:9a49] (rev 01)
31:00.0 VGA compatible controller [0300]: Intel Corporation Device [8086:56a2] (rev 08)

$ k get nodes -o=jsonpath="{range .items[*]}{.metadata.name}{'\n'}{' i915: '}{.status.allocatable.gpu\.intel\.com/i915}{'\n'}"
nodea
 i915: 2

Answer 1 · 2023-12-14T07:14:53.000Z

You are correct. Unfortunately, GPU plugin doesn't differentiate GPU devices.

Answer 2 · 2023-12-14T13:40:37.000Z

Any alternative thoughts on working around this? The device the Arc is attached to doesn't support disabling the iGPU. Functionally, I'm not sure theres anything I can do.

Answer 3 · 2023-12-14T23:27:38.000Z

Figured this out. Since the BIOS don't have an option to disable the iGPU, I added a udev rule that removes the igpu at boot. Now I'm only seeing the egpu which is provided via the plugin.

Answer 4 · 2023-12-15T07:13:22.000Z

Great!

Would you mind sharing the udev rule? I'd like to add it to the known issues for a workaround.

Answer 5 · 2023-12-15T09:54:20.000Z

While production clusters should have only GPUs of one type in given node, and per-node device type labels (provided by NFD rules) are enough for handling differences between nodes, I think this is a valid issue at least for development (cluster) setups.

Disabling iGPU is annoying because it may be only thing providing display support (for developer desktop output / local adminstration) on given node, so I think other mechanisms could be preferred.

Using VMs

Another option is creating a VM for each GPU type in the host, and passing all GPUs of given type to one VM. That way iGPU VM node can be used for running less demanding GPU workloads. This will work also for the case where all GPUs are discrete Intel ones, just different types.

Splitting host like this will obviously reduce amount of system RAM and disk available for the workloads, but VM usage is normal for clusters.

GPU plugin options

I see two potential solutions on plugin side:

GPU plugin (option for) ignoring iGPUs (not provide them as GPU resources), either on all nodes, or only on nodes that have more than one Intel GPU
GPU plugin (option for) providing iGPUs under separate resource name, e.g. "igpu"

All of those could be provided by multi-selection option --igpu <ignore,ignore-mixed,separate-resource>, but personally I would prefer last one (e.g. --igpu-resource) as iGPUs are perfectly fine for more lightweight tasks.

@tkatila Would GAS work as-is without any need for modifications for these?

Answer 6 · 2023-12-15T10:36:29.000Z

Using VMs

Another option is creating a VM for each GPU type in the host, and passing all GPUs of given type to one VM. That way iGPU VM node can be used for running less demanding GPU workloads. This will work also for the case where all GPUs are discrete Intel ones, just different types.

Splitting host like this will obviously reduce amount of system RAM and disk available for the workloads, but VM usage is normal for clusters.

As well as being labor intensive: maintaining multiple hosts etc.

I see two potential solutions on plugin side:

GPU plugin (option for) ignoring iGPUs (not provide them as GPU resources), either on all nodes, or only on nodes that have more than one Intel GPU

GPU plugin (option for) providing iGPUs under separate resource name, e.g. "igpu"

All of those could be provided by multi-selection option --igpu <ignore,ignore-mixed,separate-resource>, but personally I would prefer last one (e.g. --igpu-resource) as iGPUs are perfectly fine for more lightweight tasks.

@tkatila Would GAS work as-is without any need for modifications for these?

GAS use would break for GPU resources that are not i915. Its scheduling is tied to the use of i915 resources AFAIR.

We've been asked (once or twice) for ways to differentiate GPUs. One way to do that would be to rename the i915 resource based on the GPU model. Flex140 could become i915-flex140, Arc => i915-arc770, Integrated GPUs => i915-igpu etc. This would allow targeting different GPUs on heterogeneous hosts. The downsides would be the need to change Pod specifications and that GAS wouldn't work anymore.

Answer 7 · 2023-12-15T13:02:13.000Z

I think better separation would be by purpose / product family (Flex, Max etc), but I also think iGPUs are a bit of special case as they cannot be shuffled between nodes like dGPUs (to make sure all GPUs on given node are of same model).

GAS use would break for GPU resources that are not i915. Its scheduling is tied to the use of i915 resources AFAIR.

Ok, so iGPU ignore option would be fine with it. Whereas with igpu resource option, documentation would need to state that GAS does not handle them, which I think would be fine.

Answer 8 · 2023-12-15T16:13:57.000Z

Great!

Would you mind sharing the udev rule? I'd like to add it to the known issues for a workaround.

# cat /etc/udev/rules.d/01-persistent-igpu.rules
ACTION=="add", KERNEL=="0000:00:02.0", SUBSYSTEM=="pci", RUN+="/bin/sh -c 'echo 1 > /sys/bus/pci/devices/0000:00:02.0/remove'"

Answer 9 · 2023-12-15T17:29:23.000Z

Having an ignore option would be great!