KhronosGroup/Vulkan-ExtensionLayer

shader_object layer is triggering validation errors and causing our driver to crash

gary-sweet opened this issue · 16 comments

The cause of the crash, from what I can see, is that the shader_object layer requests that anything that can be dynamic state is requested as dynamic when the device is created. However, a bunch of that dynamic state is never actually set prior to making draw calls.

For example, if I run:

VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_shader_object:VK_LAYER_KHRONOS_validation ./deqp-vk -n dEQP-VK.pipeline.shader_object_unlinked_spirv.stencil.format.s8_uint.states.fail_keep.pass_keep.dfail_keep.comp_never

I see the following validation errors, followed by a driver crash. The final two errors are related to the missing dynamic state:

VUID-VkDeviceCreateInfo-pNext-06532(ERROR / SPEC): msgNum: -865700978 - Validation Error: [ VUID-VkDeviceCreateInfo-pNext-06532 ] Object 0: handle = 0x55a1e1ab4750, type = VK_OBJECT_TYPE_PHYSICAL_DEVICE; | MessageID = 0xcc66738e | vkCreateDevice():  If the pNext chain includes a VkPhysicalDeviceVulkan13Features structure, then it must not include a VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DYNAMIC_RENDERING_FEATURES structure. The Vulkan spec states: If the pNext chain includes a VkPhysicalDeviceVulkan13Features structure, then it must not include a VkPhysicalDeviceDynamicRenderingFeatures, VkPhysicalDeviceImageRobustnessFeatures, VkPhysicalDeviceInlineUniformBlockFeatures, VkPhysicalDeviceMaintenance4Features, VkPhysicalDevicePipelineCreationCacheControlFeatures, VkPhysicalDevicePrivateDataFeatures, VkPhysicalDeviceShaderDemoteToHelperInvocationFeatures, VkPhysicalDeviceShaderIntegerDotProductFeatures, VkPhysicalDeviceShaderTerminateInvocationFeatures, VkPhysicalDeviceSubgroupSizeControlFeatures, VkPhysicalDeviceSynchronization2Features, VkPhysicalDeviceTextureCompressionASTCHDRFeatures, or VkPhysicalDeviceZeroInitializeWorkgroupMemoryFeatures structure (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkDeviceCreateInfo-pNext-06532)
    Objects: 1
        [0] 0x55a1e1ab4750, type: 2, name: NULL

Test case 'dEQP-VK.pipeline.shader_object_unlinked_spirv.stencil.format.s8_uint.states.fail_keep.pass_keep.dfail_keep.comp_never'..
VUID-VkGraphicsPipelineCreateInfo-pLibraries-06634(ERROR / SPEC): msgNum: -117483831 - Validation Error: [ VUID-VkGraphicsPipelineCreateInfo-pLibraries-06634 ] | MessageID = 0xf8ff56c9 | vkCreateGraphicsPipelines(): pCreateInfos[0] Fragment Shader and Fragment Output Interface were created with different VkPipelineMultisampleStateCreateInfo.Fragment Shader pMultisampleState:
        pNext: (nil)
        rasterizationSamples: VK_SAMPLE_COUNT_1_BIT
        sampleShadingEnable: 0
        minSampleShading: 0.000000
        pSampleMask: (nil)
        alphaToCoverageEnable: 0
        alphaToOneEnable: 0
Fragment Output Interface pMultisampleState:
        pNext: (nil)
        rasterizationSamples: VK_SAMPLE_COUNT_1_BIT
        sampleShadingEnable: 0
        minSampleShading: 0.000000
        pSampleMask: 0x55a1e5e77330
        alphaToCoverageEnable: 0
        alphaToOneEnable: 0
. The Vulkan spec states: If an element of VkPipelineLibraryCreateInfoKHR::pLibraries was created with VK_GRAPHICS_PIPELINE_LIBRARY_FRAGMENT_SHADER_BIT_EXT with a pMultisampleState that was not NULL, and if VkGraphicsPipelineLibraryCreateInfoEXT::flags includes VK_GRAPHICS_PIPELINE_LIBRARY_FRAGMENT_OUTPUT_INTERFACE_BIT_EXT, pMultisampleState must be identically defined to that used to create the library (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkGraphicsPipelineCreateInfo-pLibraries-06634)
    Objects: 0
VUID-vkCmdDraw-None-07845(ERROR / SPEC): msgNum: 1469182825 - Validation Error: [ VUID-vkCmdDraw-None-07845 ] Object 0: handle = 0x55a1e1a66870, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x3d000000003d, type = VK_OBJECT_TYPE_PIPELINE; | MessageID = 0x5791f369 | vkCmdDraw():  VK_DYNAMIC_STATE_DEPTH_COMPARE_OP state not set for this command buffer. The Vulkan spec states: If the bound graphics pipeline state was created with the VK_DYNAMIC_STATE_DEPTH_COMPARE_OP dynamic state enabled then vkCmdSetDepthCompareOp must have been called in the current command buffer prior to this drawing command (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkCmdDraw-None-07845)
    Objects: 2
        [0] 0x55a1e1a66870, type: 6, name: NULL
        [1] 0x3d000000003d, type: 19, name: NULL
VUID-vkCmdDraw-pStrides-04913(ERROR / SPEC): msgNum: 2005986124 - Validation Error: [ VUID-vkCmdDraw-pStrides-04913 ] Object 0: handle = 0x55a1e1a66870, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x3d000000003d, type = VK_OBJECT_TYPE_PIPELINE; | MessageID = 0x7790eb4c | vkCmdDraw():  VK_DYNAMIC_STATE_VERTEX_INPUT_BINDING_STRIDE state not set for this command buffer. The Vulkan spec states: If the bound graphics pipeline was created with the VK_DYNAMIC_STATE_VERTEX_INPUT_BINDING_STRIDE_EXT dynamic state enabled, but without the VK_DYNAMIC_STATE_VERTEX_INPUT_EXT dynamic state enabled, then vkCmdBindVertexBuffers2EXT must have been called in the current command buffer prior to this draw command, and the pStrides parameter of vkCmdBindVertexBuffers2EXT must not be NULL (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkCmdDraw-pStrides-04913)
    Objects: 2
        [0] 0x55a1e1a66870, type: 6, name: NULL
        [1] 0x3d000000003d, type: 19, name: NULL

There are other failures, this just happens to be one of the first tests that runs in a *shader_object* list and the crash prevents further tests running.

@gary-sweet The validation errors in regards with dynamic state not being set come from the difference between what the spec requires from shader objects and what it requires from pipelines. There are spec changes in progress to remove the VUIDs you are seeing here and have the same requirements from pipelines and shader objects. There are CTS tests here https://gerrit.khronos.org/c/vk-gl-cts/+/12612 that test this for pipelines, can you check if these pass?

@gary-sweet The validation errors in regards with dynamic state not being set come from the difference between what the spec requires from shader objects and what it requires from pipelines. There are spec changes in progress to remove the VUIDs you are seeing here and have the same requirements from pipelines and shader objects. There are CTS tests here https://gerrit.khronos.org/c/vk-gl-cts/+/12612 that test this for pipelines, can you check if these pass?

@ziga-lunarg From our automated internal CTS testing I can see that we are passing those exploratory tests, so something isn't quite adding up here.

The crash I'm seeing looks like it might be related to the last validation error is listed above.

The crash is caused because VK_DYNAMIC_STATE_VERTEX_INPUT_BINDING_STRIDE is set, but nothing has ever called vkCmdBindVertexBuffers2() in order to set the stride before the first draw call happens. That results in an undefined stride value being used.

I've looked over the tests and a case for VK_DYNAMIC_STATE_VERTEX_INPUT_BINDING_STRIDE is indeed missing, I will add it.

@gary-sweet I am guessing that VK_EXT_vertex_input_dynamic_state is not supported on your hardware?

@gary-sweet I am guessing that VK_EXT_vertex_input_dynamic_state is not supported on your hardware?

@ziga-lunarg We have a mix, older h/w does not support it, newer h/w does.

And is this issue seen on both?

And is this issue seen on both?

Yes, I've just confirmed that.

The crash is caused because VK_DYNAMIC_STATE_VERTEX_INPUT_BINDING_STRIDE is set, but nothing has ever called vkCmdBindVertexBuffers2() in order to set the stride before the first draw call happens

I've looked over the tests and a case for VK_DYNAMIC_STATE_VERTEX_INPUT_BINDING_STRIDE is indeed missing, I will add it.

Are you suggesting that this is somehow allowed in the spec? If so, can you point me at the relevant text?

I'm not sure yet, I think there might be a bug with stride, I will look into it, but there is a spec change in progress to remove a lot of the VUIDs that require setting dynamic state that isn't actually used, it is mentioned here: https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/4675#note_453032

If stride is actually used and needed, then it's definitely a bug in the layer.

@gary-sweet After some investigation I see that the test should set the stride with vkCmdSetVertexInputEXT() which is called in vkPipelineConstructionUtil.cpp:4060. Is that not the case for you?

@gary-sweet After some investigation I see that the test should set the stride with vkCmdSetVertexInputEXT() which is called in vkPipelineConstructionUtil.cpp:4060. Is that not the case for you?

Ok, so things are becoming clearer; I do see that being called, and I see that our driver picks that value up. I then also see that we check whether VK_DYNAMIC_STATE_VERTEX_INPUT_BINDING_STRIDE is set and use the value of that instead. In this case, that value was never set.

I now notice some spec description that I hadn't seen before:

vkCmdSetVertexInputEXT says "If drawing using shader objects, or if the bound pipeline state object was also created with the VK_DYNAMIC_STATE_VERTEX_INPUT_BINDING_STRIDE dynamic state enabled, then vkCmdBindVertexBuffers2 can be used instead of vkCmdSetVertexInputEXT to dynamically set the stride."

and vkCmdBindVertexBuffers2 says "If drawing using shader objects or if the bound pipeline state object was also created with the VK_DYNAMIC_STATE_VERTEX_INPUT_EXT dynamic state enabled then vkCmdSetVertexInputEXT can be used instead of vkCmdBindVertexBuffers2 to set the stride."

That's a subtlety that had obviously escaped me previously. Essentially the stride can be set by either of those paths. My driver is currently assuming that if VK_DYNAMIC_STATE_VERTEX_INPUT_BINDING_STRIDE is set, we should find a valid stride set for it, which doesn't actually seem to be a requirement based on those extra spec paragraphs.

I'll modify the driver based on that which will hopefully resolve the issue. I'll post back when I have the result.

Ok - so taking that subtlety into account in the driver resolves the issue.

Apologies for the noise. I'll close the issue now.

Not an issue with the layer after all.

Thanks for verifying!