zeux/volk

VK_EXT_debug_utils initialization

kennyalive opened this issue · 14 comments

VK_EXT_debug_utils pointers are retreived in volkGenLoadDevice. Would it be more appropriate to move that code to volkGenLoadInstance?

Right now I have valid vkCmdBeginDebugUtilsLabelEXT pointer after volkLoadInstance call. But subsequent volkLoadDevice call sets it to null.

zeux commented

This shouldn't really happen as far as I know. I'll see if I can reproduce this myself. Can you give more details on the HW/SW that you're getting this on? Vulkan SDK version, GPU, driver version, whether you're using RenderDoc.

zeux commented

This might be a loader bug; it looks a bit similar to KhronosGroup/Vulkan-Loader#33. In that case, if no layer intercepts this entrypoint, the loader stub used to crash. This was later fixed (in KhronosGroup/Vulkan-Loader@4b858b5), but the fix doesn't affect the behavior of vkGetDeviceProcAddr, only the behavior of calling the loader stub. So that's probably what's going on here; in this case I think it's a loader bug - I'd expect both GetInstanceProcAddr and GetDeviceProcAddr to return a valid function pointer when the extension support is advertised and enabled.

Details (from memory, not in the office now): Win10 x64, GeForce 1060, SDK 1.1.82(.0?), drivers 398.xx, did not use renderdoc during test but renderdoc and its layer is installed in the system.

It really looks like a loader bug as you described because it works fine if I enable std validation layer.

zeux commented

Sorry for the delay; I've confirmed that the repro for this is trivial - just enable VK_EXT_debug_utils without any other extensions. I believe the guess above about this being a loader bug is correct, and I've updated a currently-open issue KhronosGroup/Vulkan-Loader#33; hopefully this can be addressed. I'll keep this issue open for now.

Ok, thanks.

zeux commented

This is now fixed in the loader; the new SDK published today (1.1.92) contains the loader that doesn't have this problem. The fix landed in PR KhronosGroup/Vulkan-Loader#95

I've encountered the issue again. vkCmdBeginDebugUtilsLabelEXT returned by vkGetDeviceProcAddr is null and non-zero if using vkGetInstanceProcAddr on AMD drivers and with Vulkan SDK 1.1.114.0.
I've checked the latest spec and according to section 3.1, Table 2 vkGetDeviceProcAddr should not be used to get function pointers of instance level extensions. Do you think this should be fixed in the library?

A bit more context, that could be interesting. Our codebase loads ext_debug_utils extension using vkGetInstanceProcAddr which returns valid pointers but the problem is that markers inserted with these functions are not visible in RGP. On the other hand vkGetDeviceProcAddr returns non-null pointers when RGP is active and in this case markers work as expected. Unfortunately when RGP is not opened vkGetDeviceProcAddr returns null for marker functions.

Right now it looks like that I have to use vkGetInstanceProcAddr to get ext_debug_utils function pointers and then overwrite them with vkGetDeviceProcAddr results if they are not null.

zeux commented

Yeah I think this is correct, function pointers that come from instance-level extensions should be retrieved by GIPA apparently.

zeux commented

In fact this appears to have changed in 1.0.69:

  • Clarify that flink:vkGetDeviceProcAddr only supports device-level
    commands (public issue 655).

KhronosGroup/Vulkan-Docs@ab08f09#diff-57bb29a947a8f7ed267e2e5cbf35045a

zeux commented

It's a bit of a mess, there's basically three types of functions:

  • device-level (core or provided by device extensions); can be loaded by GDPA or GIPA
  • device-level (provided by instance extensions); must be loaded by GIPA
  • instance-level (core or provided by instance extensions); must be loaded by GIPA

volk doesn't handle the second group correctly; the caveat is that it's unclear if the functions from the second group should be classified as device functions from the point of view of VolkDeviceTable.

My current train of thought leads me to exclude device-level instance extension-provided functions from VolkDeviceTable since you never need to load them from multiple devices when using explicit mGPU. Technically it's an interface-breaking change, hopefully the debug_utils is the only example here so it's not too bad.

Sounds good. I only hope that in my AMD situation it would be possible for them (driver? loader?) to fix marker issue so GDPA call (which is a workaround) won't be needed and it still will be fast.

zeux commented

Please let me know if the PR fixes the issue for you; as far as I understand, this makes volk spec-compliant in this area, but I don't have an AMD system to test this on.

Sure, I will know tomorrow.

Works fine for me both on amd/nvidia (no crashes due to null pointers). As expected the issue with AMD profiler still exists but it's not a volk issue.