Sort algorithms fail running on AMD Radeon RX Vega 56
rosenrodt opened this issue · 12 comments
Similar to issue reported in #795, I am facing all kinds of sort() failures with AMD RX Vega GPUs
I ran the test with the command ctest --output-on-failure
. And here is the summary
Test result for driver Adrenalin 18.5.2 & 18.9.1
The following tests FAILED:
54 - algorithm.radix_sort (Failed)
55 - algorithm.radix_sort_by_key (Failed)
75 - algorithm.sort_by_key (Failed)
77 - algorithm.stable_sort (Failed)
143 - misc.amd_cpp_kernel_language (Failed)
148 - example.amd_cpp_kernel (Exit code 0xc0000409)
Test result for driver Adrenalin 18.12.2
The following tests FAILED:
41 - algorithm.insertion_sort (Failed)
45 - algorithm.merge_sort_gpu (Failed)
54 - algorithm.radix_sort (Failed)
55 - algorithm.radix_sort_by_key (Failed)
74 - algorithm.sort (Failed)
75 - algorithm.sort_by_key (Failed)
77 - algorithm.stable_sort (Failed)
143 - misc.amd_cpp_kernel_language (Failed)
148 - example.amd_cpp_kernel (Exit code 0xc0000409)
Bold items mean the test failed on new driver but not in older drivers
Curiously, with latest drivers it gets even worse.
Note 1. For complete failure reports look here.
Note 2. As I recall, AMD Radeon HD 6770 passes every test (maybe except for the amd_cpp_kernel tests)
Back with some quick update: Adrenalin 18.12.2 doesn't seem like a quality driver so let's just ignore it. I believe this issue is caused by pointer aliasing inside radix sort kernels. Will post a PR as soon as it is been confirmed
Which pointers in the loop are aliasing each other?
I figured it’s not pointer aliasing in the scan() kernel I checked (see the work-in-progress pr #812). But rather variable dependency is not detected by the AMD OpenCL compiler.
On the other hand I am seeing really interesting bug with the AMD driver. The compare operators for char
and uchar
types are not working so all the errors I get from the tests are emitted by is_sorted()
when testing on char types. Fortunately the equality check for sorted char arrays are all passed
Can you check Adrenalin 19.1.1?
It would also be great to open an issue on AMD.
btw. I guess we should disable/remove AMD C++ tests. They're proboly not supported in new OpenCL drivers.
Can you check Adrenalin 19.1.1?
No luck with 19.1.1 too
Fixed in #812, @rosenrodt are you planning to open bug for AMD driver?
I think https://community.amd.com/community/devgurus/opencl is the best place for that. However, they may ask you to make a small, independent program for bug reproduction.
I'll post on the forum sometime this week
Back with some updates :)
I opened a ticket to report the driver bug that appears on Adrenalin 18.12.2 and onwards https://community.amd.com/message/2897171. The mod can't repro the issue on Hawaii GPUs (no surprise, as it only happens on recent GPUs) so he is passing it to the relevant teams
As for the memory fence workaround I will post as a separate bug report on AMD forum.
I opened a ticket to report the driver bug that appears on Adrenalin 18.12.2 and onwards https://community.amd.com/message/2897171. The mod can't repro the issue on Hawaii GPUs (no surprise, as it only happens on recent GPUs) so he is passing it to the relevant teams
Though not directly confirmed by AMD staff yet, the issue of comparing char
types using boost::compute::is_sorted()
seems to be resolved as of Adrenalin 19.3.2. Both the standalone minimal test sample and Boost Compute master branch now work as expected.
So I consider this issue resolved.