IGCIT/Intel-GPU-Community-Issue-Tracker-IGCIT

BoolkaEngine crash when using unaligned vertex struct in mesh shader

Opened this issue ยท 7 comments

Checklist [README]

  • Device is not undervolted nor overclocked
  • Device is using the latest drivers
  • Application is not cracked, modded and use the latest patch

Application [Required]

BoolkaEngine

Processor / Processor Number [Required]

AMD Ryzen 5 3600 6-Core Processor

Graphic Card [Required]

Intel(R) Arc(TM) A770 Graphics

GPU Driver Version [Required]

31.0.101.5382

Other GPU Driver version

No response

Rendering API [Required]

  • Vulkan
  • OpenGL
  • DirectX12
  • DirectX11
  • DirectX10
  • DirectX9
  • Not applicable

Windows Build Number [Required]

  • Windows 11 23H2
  • Windows 11 22H2
  • Windows 11 21H2
  • Windows 10 22H2
  • Windows 10 21H2
  • Other (Please specify)

Other Windows build number

No response

Intel System Support Utility report

igcit_ssu.txt

Description and steps to reproduce [Required]

Download latest release of BoolkaEngine - https://github.com/Devaniti/BoolkaEngine/releases/tag/v0.2
Extract all files
Run start.bat
Application will start and immediately crash

BoolkaEngine is my D3D12 engine pet project.
It works just fine on Nvidia/AMD GPUs.

The issue seems to be related to execution of Mesh Shaders
There is a commit with a workaround for this crash - Devaniti/BoolkaEngine@02855c2
That workaround changes layout of the Vertex struct used with Mesh Shaders
Since there are no relevant limitations on the layout of vertex struct, it is highly likely that it is not UB inside BoolkaEngine, but rather mishandling of vertex struct layout in mesh shaders inside Intel driver

Device / Platform

No response

Crash dumps [Required, if applicable]

No response

Application / Windows logs

No response

@Devaniti hiii and welcome!
I provide support for Game/App developers and I will be assisting you in this case
Let me confirm this crash and I'll be back with my findings. If I have questions I'll ping you right back :)

Karen

Heey @Devaniti quick update!
I could verify the correct excecution of the scene using the build in my NVIDIA RTX 3050 but unfortunately it crashes in my ARC with driver v.5382. I have also performed a small regression like you suggested and the behavior is the same, so I'll be creating an internal report for this.
A couple questions for my report:

  1. Is there an official dx12 doc that you followed to find that the matrix should be returned the way you originally did it? If so, please share
  2. Can you share how many users (give or take) might be impacted?

Thanks, looking forward to hear from you :)

Karen

  1. The most relevant document is this one - https://microsoft.github.io/DirectX-Specs/d3d/MeshShader.html#vertex-attributes
    There is no limit on the structure of the Vertex Attributes, only requirements to specify semantic for each field of the structure and have 4 component element with SV_Position semantic, which are fulfilled in both version before and after the workaround.
  2. This is more of a code sample, and not application that people would actively use. So users that are impacted by the crash in the app itself is about 0, but people may use this code that crashing on Intel ARC as a reference in other projects. Either way I'd expect quite small number of people to be affected by this bug.

As for the workaround having same behavior, on my end workaround does fix the crash on Intel ARC.
To ensure that we are running same code in each case, you can build both versions from scratch:

  1. Clone the https://github.com/Devaniti/BoolkaEngine repo
  2. Run HelperScripts/QuickStart.bat on main branch to observe the crash
  3. Run HelperScripts/QuickStart.bat on the IntelArcWorkaround branch to observe it working with the workaround

That script will build the project, download and prepare the scene and run the app.

Ty @Devaniti
I have been able to run both branches, but I'd rather focus on the one without the WA and see what we can do on the driver side.
Edit: doing some research. Will update soon

Karen

@Devaniti Just a quick comment:
Looking at the document you shared
image

This function requires a 4-component vector, and looking at the WA code you just did that: change from int to int4 and float2 and float3 to float4, right?

So, it seems to me that the WA it is the correct way to use this function.
I assume that Nvidia/AMD drivers are converting those non 4-component vectors into 4 elements valid vectors.

Do you agree?

In Vertex Attributes, you are required to have one 4-component attribute with SV_Position semantic. If relevant attribute is not 4-component vector, shader compilation fails with error : SV_Position must be float4..
As you can see, both before and after workaround, there's float4 position : SV_Position;, which satisfies that requirement, and the shaders successfully build in both cases.
And since the highlighted requirement does not limit other attributes, it is valid for other attributes to have other sizes.

This official mesh shader code sample uses non 4-component attributes as well - https://github.com/microsoft/DirectX-Graphics-Samples/blob/master/Samples/Desktop/D3D12MeshShaders/src/MeshletCull/MeshletCommon.hlsli https://github.com/microsoft/DirectX-Graphics-Samples/blob/master/Samples/Desktop/D3D12MeshShaders/src/MeshletCull/MeshletMS.hlsl

image
I was able to run that sample (MeshletCull) using A770...
I will run this sample with a NV gpu to see if is there any difference

-- r2