Device side enqueue causes a crash in bizarre circumstances/possible compiler bug
20k opened this issue · 3 comments
Hi there! I recently bought a 6700xt and have been running into a few issues with the OpenCL support, top of which is that device side enqueues via enqueue_kernel seem to cause a crash in some circumstances. Someone pointed me over here and said I should file a bug. If this isn't the appropriate place, I apologise! :)
Unfortunately during the course of producing a minimal repro, I've discovered that the crash is dependent on surrounding unused code. I had to chop this example down from a much larger source, and chopping it down further is more difficult
Built with -lopencl with mingw64's gcc
The output I get is this:
NAME __relauncher_generic_block_invoke_kernel
NAME get_geodesic_path
NAME relauncher_generic
Result 0
0x00007FFBCC2FCF69 (0x0000000000000006 0x000000CB517FF9D0 0x000002172B3C31A0 0x0000000000000000), clGetPipeInfo() + 0xD46B9 bytes(s)
0x00007FFBCC303C2D (0x000000CB517FFC00 0x0000000000000001 0x000002172A028D20 0x000002172A029AB0), clGetPipeInfo() + 0xDB37D bytes(s)
0x00007FFBCC303266 (0x000002172A5F6950 0x0000000000000000 0x000002172B3C31B8 0x0000000000000000), clGetPipeInfo() + 0xDA9B6 bytes(s)
0x00007FFBCC238EEE (0x000002172A5F6950 0x000002172B3C31A0 0x000002172A5F6950 0x000002172A5F6CC0), clGetPipeInfo() + 0x1063E bytes(s)
0x00007FFBCC238FE1 (0x000002172A4EDE00 0x0000000000000000 0x000002172A4EDE00 0x0000000000000000), clGetPipeInfo() + 0x10731 bytes(s)
0x00007FFBCC2294DA (0x000002172A4EDE00 0x0000000000000000 0x0000000000000000 0x0000000000000000), clGetPipeInfo() + 0xC2A bytes(s)
0x00007FFBCC248CD9 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), clGetPipeInfo() + 0x20429 bytes(s)
0x00007FFC0DAC7034 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), BaseThreadInitThunk() + 0x14 bytes(s)
0x00007FFC0F962651 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), RtlUserThreadStart() + 0x21 bytes(s)
The specific crash line appears to be clFinish(cqueue);
Removing the device side enqueue appears to fix the crash. Removing several other unused pieces of code (eg the other unused kernel, or the marked completely unused function) also appears to fix the crash. Reworking some of the code in various miscellaneous ways also appears to fix the crash. Removing either of the first two build flags also seems to fix the crash
The only kernel which is actually run there is relauncher_generic, which does essentially nothing other than enqueue an empty block
Specs: Windows 10 Pro 21H1, 5800x, 6700xt with driver 21.3.2, 16GB ddr4. Its a brand new pc on a brand new install of windows, so there's not much else going on here. The code from the larger project this is derived from worked without issue on an r9 390 as of few weeks ago, though unfortunately I do not have that GPU to test as it has melted
If you need any more info or anything else, please let me know!
Thanks for the report, we've reproduced the issue internally.
We've submitted the fix internally, but we missed backporting the change to the 21.10 driver (that released 2 days ago). We'll update this issue once a public driver will be available with this fix.
Thanks very much for the update, its nice to see that this has been reproduced and fixed so quickly!