CNugteren/CLBlast

AMD SI GPU's - Weird result.

skn123 opened this issue · 14 comments

I have raised an issue in the vexcl repo
ddemidov/vexcl#254
and a solution was provided that solved the issue. Can a similar workaround be provided for CLBlast
also given that this library is used by Caffe?

OK, that looks like a weird problem, not really sure I like the solution, but I've done something similar here: CLBlast-301-weird-AMD-Hainan-bug. It doesn't really work on my machine, not sure what is wrong. Could you take a look as well? This is the commit:
e8dea34

Ok, As I had pointed out to Denis, Hainan is just one of the card names. The actual name should be SI (southern island). I will check the code and get back to you.

Well this is a first
1.) If I do not enable AMD_HAINAN
Completed SGEMM in 606.385 ms with status 0
2.) If I enable AMD_HAINAN
#if SA == 1 && SB == 1
XgemmBody(kSizeM, kSizeN, kSizeK, agm, bgm, cgm, alpha, beta, alm, blm);
#elif SA == 1
XgemmBody(kSizeM, kSizeN, kSizeK, agm, bgm, cgm, alpha, beta, alm);
#elif SB == 1
XgemmBody(kSizeM, kSizeN, kSizeK, agm, bgm, cgm, alpha, beta, blm);
#else
XgemmBody(kSizeM, kSizeN, kSizeK, agm, bgm, cgm, alpha, beta);
#endif
}

#endif
// =================================================================================================

// End of the C++11 raw string literal

__kernel void null_kernel() {}

Segmentation fault (core dumped)

So it would be interesting to know what have you done with your .cl files that make it work in the first place!

OK, I have just pushed new commits to the branch. Could you try again?

Segmentation fault (core dumped)

I could reproduce and I've fixed that

The actual name should be SI (southern island)

OK, renamed it. AMD_SI_EMPTY_KERNEL_WORKAROUND is now the name of the CMake setting.

So it would be interesting to know what have you done with your

Didn't change anything else, so the bug should still be there if you don't enable this workaround.

Ok, code builds and I tried a few examples
naths@naths-HP-Laptop-15-bs1xx:/build/clblast$ ./clblast_sample_cache_c
Starting caching sample with an empty cache
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
Completed routine with status -57 in 53.713 ms
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
Completed routine with status -57 in 0.795 ms
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
Completed routine with status -57 in 0.809 ms
Clearing cache
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
Completed routine with status -57 in 52.720 ms
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
Completed routine with status -57 in 0.845 ms
Clearing cache
Filling cache (this might take a while)
^C
naths@naths-HP-Laptop-15-bs1xx:
/build/clblast$ ./clblast_sample_sgemm_c
Completed SGEMM with status 0
naths@naths-HP-Laptop-15-bs1xx:~/build/clblast$ ./clblast_sample_sgemm
Completed SGEMM in 607.678 ms with status 0

naths@naths-HP-Laptop-15-bs1xx:~/build/clblast$ ./clblast_tuner_xaxpy

  • Options given/available:
    -platform 0 [=default]
    -device 0 [=default]
    -precision 32 (single) [=default]
    -n 4194304 [=default]
    -alpha 2.00 [=default]
    -fraction 1.00 [=default]
    -runs 10 [=default]
    -max_l2_norm 0.00 [=default]

  • Found 96 configuration(s)

  • Parameters explored: WGS WPT VW

| ID | total | param | compiles | time | GB/s | status |
x------x-------x----------------x----------------x--------------x--------x-------------------x
| ref | - | - | OK | 4.09 ms | - | reference OK |
x------x-------x----------------x----------------x--------------x--------x-------------------x
| 1 | 96 | 64 1 1 | OK 79 ms | 4.25 ms | 11.8 | results match |
| 2 | 96 | 64 1 2 | OK 59 ms | 4.25 ms | 11.8 | results match |
| 3 | 96 | 64 1 4 | OK 62 ms | 4.34 ms | 11.6 | results match |
| 4 | 96 | 64 1 8 | OK 68 ms | 4.70 ms | 10.7 | results match |
| 5 | 96 | 64 2 1 | OK 79 ms | 4.10 ms | 12.3 | results match |
| 6 | 96 | 64 2 2 | OK 63 ms | 4.15 ms | 12.1 | results match |
| 7 | 96 | 64 2 4 | OK 64 ms | 4.45 ms | 11.3 | results match |
| 8 | 96 | 64 2 8 | OK 67 ms | 4.84 ms | 10.4 | results match |
| 9 | 96 | 64 4 1 | OK 65 ms | 4.21 ms | 11.9 | results match |
| 10 | 96 | 64 4 2 | OK 85 ms | 4.21 ms | 11.9 | results match |
| 11 | 96 | 64 4 4 | OK 70 ms | 4.39 ms | 11.5 | results match |
| 12 | 96 | 64 4 8 | OK 76 ms | 4.91 ms | 10.2 | results match |
| 13 | 96 | 64 8 1 | OK 77 ms | 4.18 ms | 12.0 | results match |
| 14 | 96 | 64 8 2 | OK 74 ms | 4.31 ms | 11.7 | results match |
| 15 | 96 | 64 8 4 | OK 82 ms | 4.13 ms | 12.2 | results match |
| 16 | 96 | 64 8 8 | OK 97 ms | 4.89 ms | 10.3 | results match |
| 17 | 96 | 128 1 1 | OK 81 ms | 3.92 ms | 12.8 | results match |
| 18 | 96 | 128 1 2 | OK 61 ms | 4.06 ms | 12.4 | results match |
| 19 | 96 | 128 1 4 | OK 59 ms | 4.14 ms | 12.2 | results match |
| 20 | 96 | 128 1 8 | OK 87 ms | 4.74 ms | 10.6 | results match |
| 21 | 96 | 128 2 1 | OK 63 ms | 4.29 ms | 11.7 | results match |
| 22 | 96 | 128 2 2 | OK 63 ms | 4.37 ms | 11.5 | results match |
| 23 | 96 | 128 2 4 | OK 66 ms | 4.34 ms | 11.6 | results match |
| 24 | 96 | 128 2 8 | OK 71 ms | 4.72 ms | 10.7 | results match |
| 25 | 96 | 128 4 1 | OK 68 ms | 4.22 ms | 11.9 | results match |
| 26 | 96 | 128 4 2 | OK 72 ms | 4.41 ms | 11.4 | results match |
| 27 | 96 | 128 4 4 | OK 72 ms | 4.57 ms | 11.0 | results match |
| 28 | 96 | 128 4 8 | OK 79 ms | 5.18 ms | 9.7 | results match |
| 29 | 96 | 128 8 1 | OK 75 ms | 4.39 ms | 11.5 | results match |
| 30 | 96 | 128 8 2 | OK 82 ms | 4.18 ms | 12.0 | results match |
| 31 | 96 | 128 8 4 | OK 82 ms | 4.55 ms | 11.1 | results match |
| 32 | 96 | 128 8 8 | OK 98 ms | 5.05 ms | 10.0 | results match |
| 33 | 96 | 256 1 1 | OK 57 ms | 4.13 ms | 12.2 | results match |
| 34 | 96 | 256 1 2 | OK 57 ms | 3.98 ms | 12.6 | results match |
| 35 | 96 | 256 1 4 | OK 58 ms | 4.02 ms | 12.5 | results match |
| 36 | 96 | 256 1 8 | OK 63 ms | 4.58 ms | 11.0 | results match |
| 37 | 96 | 256 2 1 | OK 59 ms | 4.18 ms | 12.0 | results match |
| 38 | 96 | 256 2 2 | OK 80 ms | 4.00 ms | 12.6 | results match |
| 39 | 96 | 256 2 4 | OK 63 ms | 4.29 ms | 11.7 | results match |
| 40 | 96 | 256 2 8 | OK 68 ms | 4.79 ms | 10.5 | results match |
| 41 | 96 | 256 4 1 | OK 63 ms | 4.10 ms | 12.3 | results match |
| 42 | 96 | 256 4 2 | OK 65 ms | 4.05 ms | 12.4 | results match |
| 43 | 96 | 256 4 4 | OK 69 ms | 4.38 ms | 11.5 | results match |
| 44 | 96 | 256 4 8 | OK 75 ms | 4.76 ms | 10.6 | results match |
| 45 | 96 | 256 8 1 | OK 71 ms | 4.32 ms | 11.7 | results match |
| 46 | 96 | 256 8 2 | OK 88 ms | 4.09 ms | 12.3 | results match |
| 47 | 96 | 256 8 4 | OK 81 ms | 4.22 ms | 11.9 | results match |
| 48 | 96 | 256 8 8 | OK 99 ms | 4.98 ms | 10.1 | results match |
| 49 | 96 | 512 1 1 | compilation error: -11 | - | - | <-- skipping
| 50 | 96 | 512 1 2 | compilation error: -11 | - | - | <-- skipping
| 51 | 96 | 512 1 4 | compilation error: -11 | - | - | <-- skipping
| 52 | 96 | 512 1 8 | compilation error: -11 | - | - | <-- skipping
| 53 | 96 | 512 2 1 | compilation error: -11 | - | - | <-- skipping
| 54 | 96 | 512 2 2 | compilation error: -11 | - | - | <-- skipping
| 55 | 96 | 512 2 4 | compilation error: -11 | - | - | <-- skipping
| 56 | 96 | 512 2 8 | compilation error: -11 | - | - | <-- skipping
| 57 | 96 | 512 4 1 | compilation error: -11 | - | - | <-- skipping
| 58 | 96 | 512 4 2 | compilation error: -11 | - | - | <-- skipping
| 59 | 96 | 512 4 4 | compilation error: -11 | - | - | <-- skipping
| 60 | 96 | 512 4 8 | compilation error: -11 | - | - | <-- skipping
| 61 | 96 | 512 8 1 | compilation error: -11 | - | - | <-- skipping
| 62 | 96 | 512 8 2 | compilation error: -11 | - | - | <-- skipping
| 63 | 96 | 512 8 4 | compilation error: -11 | - | - | <-- skipping
| 64 | 96 | 512 8 8 | compilation error: -11 | - | - | <-- skipping
| 65 | 96 | 1024 1 1 | compilation error: -11 | - | - | <-- skipping
| 66 | 96 | 1024 1 2 | compilation error: -11 | - | - | <-- skipping
| 67 | 96 | 1024 1 4 | compilation error: -11 | - | - | <-- skipping
| 68 | 96 | 1024 1 8 | compilation error: -11 | - | - | <-- skipping
| 69 | 96 | 1024 2 1 | compilation error: -11 | - | - | <-- skipping
| 70 | 96 | 1024 2 2 | compilation error: -11 | - | - | <-- skipping
| 71 | 96 | 1024 2 4 | compilation error: -11 | - | - | <-- skipping
| 72 | 96 | 1024 2 8 | compilation error: -11 | - | - | <-- skipping
| 73 | 96 | 1024 4 1 | compilation error: -11 | - | - | <-- skipping
| 74 | 96 | 1024 4 2 | compilation error: -11 | - | - | <-- skipping
| 75 | 96 | 1024 4 4 | compilation error: -11 | - | - | <-- skipping
| 76 | 96 | 1024 4 8 | compilation error: -11 | - | - | <-- skipping
| 77 | 96 | 1024 8 1 | compilation error: -11 | - | - | <-- skipping
| 78 | 96 | 1024 8 2 | compilation error: -11 | - | - | <-- skipping
| 79 | 96 | 1024 8 4 | compilation error: -11 | - | - | <-- skipping
| 80 | 96 | 1024 8 8 | compilation error: -11 | - | - | <-- skipping
| 81 | 96 | 2048 1 1 | compilation error: -11 | - | - | <-- skipping
| 82 | 96 | 2048 1 2 | compilation error: -11 | - | - | <-- skipping
| 83 | 96 | 2048 1 4 | compilation error: -11 | - | - | <-- skipping
| 84 | 96 | 2048 1 8 | compilation error: -11 | - | - | <-- skipping
| 85 | 96 | 2048 2 1 | compilation error: -11 | - | - | <-- skipping
| 86 | 96 | 2048 2 2 | compilation error: -11 | - | - | <-- skipping
| 87 | 96 | 2048 2 4 | compilation error: -11 | - | - | <-- skipping
| 88 | 96 | 2048 2 8 | compilation error: -11 | - | - | <-- skipping
| 89 | 96 | 2048 4 1 | compilation error: -11 | - | - | <-- skipping
| 90 | 96 | 2048 4 2 | compilation error: -11 | - | - | <-- skipping
| 91 | 96 | 2048 4 4 | compilation error: -11 | - | - | <-- skipping
| 92 | 96 | 2048 4 8 | compilation error: -11 | - | - | <-- skipping
| 93 | 96 | 2048 8 1 | compilation error: -11 | - | - | <-- skipping
| 94 | 96 | 2048 8 2 | compilation error: -11 | - | - | <-- skipping
| 95 | 96 | 2048 8 4 | compilation error: -11 | - | - | <-- skipping
| 96 | 96 | 2048 8 8 | compilation error: -11 | - | - | <-- skipping
x------x-------x----------------x----------------x--------------x--------x-------------------x

  • Got average result of 4.38 ms: 11.5 GB/s

  • Found best result 3.92 ms: 12.8 GB/s

  • Best parameters: PRECISION=32 VW=1 WGS=128 WPT=1

  • Writing a total of 48 results to 'clblast_xaxpy_32.json'

  • Completed tuning process

There is a bug that I Have reported here also
Shark-ML/Remora#14

Results of clinfo
naths@naths-HP-Laptop-15-bs1xx:~/build/clblast$ clinfo
Number of platforms: 2
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (2639.3)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1
Platform Name: Intel(R) OpenCL HD Graphics
Platform Vendor: Intel(R) Corporation
Platform Extensions: cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_khr_subgroups cl_khr_il_program cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation

Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon (TM) R5 M330
Device Topology: PCI[ B#1, D#0, F#0 ]
Max compute units: 5
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 750Mhz
Address bits: 64
Max memory allocation: 1596905472
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 2146349056
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 0
Max pipe active reservations: 0
Max pipe packet size: 0
Max global variable size: 0
Max global variable preferred total size: 0
Max read/write image args: 0
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7fc6610699f0
Name: Hainan
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 2639.3
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2639.3)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

Platform Name: Intel(R) OpenCL HD Graphics
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 8086h
Max compute units: 24
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1100Mhz
Address bits: 64
Max memory allocation: 4294959104
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 128
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 16384
Max image 3D height: 16384
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 524288
Global memory size: 13386412032
Constant buffer size: 4294959104
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 1
Max pipe packet size: 1024
Max global variable size: 65536
Max global variable preferred total size: 4294959104
Max read/write image args: 128
Max on device events: 1024
Queue on device max size: 67108864
Max on device queues: 1
Queue on device preferred size: 131072
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 64
Preferred global atomic alignment: 64
Preferred local atomic alignment: 64
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 83
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: Yes
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x2045550
Name: Intel(R) Gen9 HD Graphics NEO
Vendor: Intel(R) Corporation
Device OpenCL C version: OpenCL C 2.0
Driver version: 18.29.11114
Profile: FULL_PROFILE
Version: OpenCL 2.1 NEO
Extensions: cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_khr_subgroups cl_khr_il_program cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation

OK, so what are you saying exactly? I see an error -57, which indicates a CL_INVALID_EVENT_WAIT_LIST issue, I'll look into it. But more importantly, perhaps you should run the tests (compile with -DTESTS=ON) and see if the issue is resolved, i.e. the library produces correct results?

naths@naths-HP-Laptop-15-bs1xx:~/build/clblast$ ./clblast_test_xherk

  • Options given/available:
    -platform 0 [=default]
    -device 0 [=default]
    -full_test [false]
    -verbose [false]
    -cblas 1 [=default]

  • Running on OpenCL device 'AMD Radeon (TM) R5 M330'.

  • Starting tests for the 'CHERK' routine. Legend:
    : -> Test produced correct results
    . -> Test returned the correct error code
    X -> Test produced incorrect results
    / -> Test returned an incorrect error code
    \ -> Test not executed: OpenCL-kernel compilation error
    o -> Test not executed: Unsupported precision

    • -> Test not completed: Reference CBLAS doesn't output error codes
  • Testing with error margins of 0.5% (relative) and 0.001 (absolute)

  • Testing 'regular behaviour' for '101 (row-major) 121 (upper) 111 (regular)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /--CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 7 skipped / 9 failed

  • Testing 'regular behaviour' for '101 (row-major) 122 (lower) 111 (regular)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /--CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 7 skipped / 9 failed

  • Testing 'regular behaviour' for '101 (row-major) 121 (upper) 113 (conjugate)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 6 skipped / 10 failed

  • Testing 'regular behaviour' for '101 (row-major) 122 (lower) 113 (conjugate)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 6 skipped / 10 failed

  • Testing 'regular behaviour' for '102 (col-major) 121 (upper) 111 (regular)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 6 skipped / 10 failed

  • Testing 'regular behaviour' for '102 (col-major) 122 (lower) 111 (regular)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 6 skipped / 10 failed

  • Testing 'regular behaviour' for '102 (col-major) 121 (upper) 113 (conjugate)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /--CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 7 skipped / 9 failed

  • Testing 'regular behaviour' for '102 (col-major) 122 (lower) 113 (conjugate)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /--CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 7 skipped / 9 failed

  • Completed all test-cases for this routine. Results:
    0 test(s) passed
    52 test(s) skipped
    76 test(s) failed

  • Running on OpenCL device 'AMD Radeon (TM) R5 M330'.

  • Starting tests for the 'ZHERK' routine. Legend:
    : -> Test produced correct results
    . -> Test returned the correct error code
    X -> Test produced incorrect results
    / -> Test returned an incorrect error code
    \ -> Test not executed: OpenCL-kernel compilation error
    o -> Test not executed: Unsupported precision

    • -> Test not completed: Reference CBLAS doesn't output error codes
  • Testing with error margins of 0.5% (relative) and 0.001 (absolute)

  • Testing 'regular behaviour' for '101 (row-major) 121 (upper) 111 (regular)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /--CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 7 skipped / 9 failed

  • Testing 'regular behaviour' for '101 (row-major) 122 (lower) 111 (regular)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /--CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 7 skipped / 9 failed

  • Testing 'regular behaviour' for '101 (row-major) 121 (upper) 113 (conjugate)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 6 skipped / 10 failed

  • Testing 'regular behaviour' for '101 (row-major) 122 (lower) 113 (conjugate)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 6 skipped / 10 failed

  • Testing 'regular behaviour' for '102 (col-major) 121 (upper) 111 (regular)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 6 skipped / 10 failed

  • Testing 'regular behaviour' for '102 (col-major) 122 (lower) 111 (regular)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 6 skipped / 10 failed

  • Testing 'regular behaviour' for '102 (col-major) 121 (upper) 113 (conjugate)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /--CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 7 skipped / 9 failed

  • Testing 'regular behaviour' for '102 (col-major) 122 (lower) 113 (conjugate)':
    CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /--CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /-CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /---CLBlast: OpenCL error: clEnqueueNDRangeKernel: -57
    /
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=7 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=7 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=7 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=7 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Status code -57 (expected 0): n=64 k=64 lda=64 ldc=64 offa=0 offc=0 alpha=3.14 beta=3.14
    Pass rate 0.0%: 0 passed / 7 skipped / 9 failed

  • Completed all test-cases for this routine. Results:
    0 test(s) passed
    52 test(s) skipped
    76 test(s) failed

I think this is a test where the error is seen.

OK, you are right, that is indeed an issue. I fixed it now, but now I'm not sure it still fixes your original issue, since the kernel is enqueued after but not based on events. Could you try again?

As I had mentioned, your code even without the AMD_SI hack seems to work! I wonder why :)
Now, for the test after I add your changes.
naths@naths-HP-Laptop-15-bs1xx:~/build/clblast$ ./clblast_test_xherk

  • Options given/available:
    -platform 0 [=default]
    -device 0 [=default]
    -full_test [false]
    -verbose [false]
    -cblas 1 [=default]

  • Running on OpenCL device 'AMD Radeon (TM) R5 M330'.

  • Starting tests for the 'CHERK' routine. Legend:
    : -> Test produced correct results
    . -> Test returned the correct error code
    X -> Test produced incorrect results
    / -> Test returned an incorrect error code
    \ -> Test not executed: OpenCL-kernel compilation error
    o -> Test not executed: Unsupported precision

    • -> Test not completed: Reference CBLAS doesn't output error codes
  • Testing with error margins of 0.5% (relative) and 0.001 (absolute)

  • Testing 'regular behaviour' for '101 (row-major) 121 (upper) 111 (regular)':
    ::::--::-:-:---:
    Pass rate 56.2%: 9 passed / 7 skipped / 0 failed

  • Testing 'regular behaviour' for '101 (row-major) 122 (lower) 111 (regular)':
    ::::--::-:-:---:
    Pass rate 56.2%: 9 passed / 7 skipped / 0 failed

  • Testing 'regular behaviour' for '101 (row-major) 121 (upper) 113 (conjugate)':
    ::::::::---:---:
    Pass rate 62.5%: 10 passed / 6 skipped / 0 failed

  • Testing 'regular behaviour' for '101 (row-major) 122 (lower) 113 (conjugate)':
    ::::::::---:---:
    Pass rate 62.5%: 10 passed / 6 skipped / 0 failed

  • Testing 'regular behaviour' for '102 (col-major) 121 (upper) 111 (regular)':
    ::::::::---:---:
    Pass rate 62.5%: 10 passed / 6 skipped / 0 failed

  • Testing 'regular behaviour' for '102 (col-major) 122 (lower) 111 (regular)':
    ::::::::---:---:
    Pass rate 62.5%: 10 passed / 6 skipped / 0 failed

  • Testing 'regular behaviour' for '102 (col-major) 121 (upper) 113 (conjugate)':
    ::::--::-:-:---:
    Pass rate 56.2%: 9 passed / 7 skipped / 0 failed

  • Testing 'regular behaviour' for '102 (col-major) 122 (lower) 113 (conjugate)':
    ::::--::-:-:---:
    Pass rate 56.2%: 9 passed / 7 skipped / 0 failed

  • Completed all test-cases for this routine. Results:
    76 test(s) passed
    52 test(s) skipped
    0 test(s) failed

  • Running on OpenCL device 'AMD Radeon (TM) R5 M330'.

  • Starting tests for the 'ZHERK' routine. Legend:
    : -> Test produced correct results
    . -> Test returned the correct error code
    X -> Test produced incorrect results
    / -> Test returned an incorrect error code
    \ -> Test not executed: OpenCL-kernel compilation error
    o -> Test not executed: Unsupported precision

    • -> Test not completed: Reference CBLAS doesn't output error codes
  • Testing with error margins of 0.5% (relative) and 0.001 (absolute)

  • Testing 'regular behaviour' for '101 (row-major) 121 (upper) 111 (regular)':
    ::::--::-:-:---:
    Pass rate 56.2%: 9 passed / 7 skipped / 0 failed

  • Testing 'regular behaviour' for '101 (row-major) 122 (lower) 111 (regular)':
    ::::--::-:-:---:
    Pass rate 56.2%: 9 passed / 7 skipped / 0 failed

  • Testing 'regular behaviour' for '101 (row-major) 121 (upper) 113 (conjugate)':
    ::::::::---:---:
    Pass rate 62.5%: 10 passed / 6 skipped / 0 failed

  • Testing 'regular behaviour' for '101 (row-major) 122 (lower) 113 (conjugate)':
    ::::::::---:---:
    Pass rate 62.5%: 10 passed / 6 skipped / 0 failed

  • Testing 'regular behaviour' for '102 (col-major) 121 (upper) 111 (regular)':
    ::::::::---:---:
    Pass rate 62.5%: 10 passed / 6 skipped / 0 failed

  • Testing 'regular behaviour' for '102 (col-major) 122 (lower) 111 (regular)':
    ::::::::---:---:
    Pass rate 62.5%: 10 passed / 6 skipped / 0 failed

  • Testing 'regular behaviour' for '102 (col-major) 121 (upper) 113 (conjugate)':
    ::::--::-:-:---:
    Pass rate 56.2%: 9 passed / 7 skipped / 0 failed

  • Testing 'regular behaviour' for '102 (col-major) 122 (lower) 113 (conjugate)':
    ::::--::-:-:---:
    Pass rate 56.2%: 9 passed / 7 skipped / 0 failed

  • Completed all test-cases for this routine. Results:
    76 test(s) passed
    52 test(s) skipped
    0 test(s) failed

Ok, now when I use your updated code with Remora, I get the following error:
naths@naths-HP-Laptop-15-bs1xx:~/build/remora/bin$ ./Benchmark_GPU_Conv2D
performance float
Flops
35 4 8 32 14446.3
67 4 8 32 16223.2
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
131 4 8 32 64106.2
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -5
.....

OK, now I'm confused. So if the original code did work, why did you open this issue? Do all tests (make test) pass? So should I merge it in or not?

About the other thing, that is a non CLBlast program, so you'll have to debug that yourself I'm afraid. The OpenCL error -5 indicates CL_OUT_OF_RESOURCES, so that should give you a hint :-)

Yes you can close this issue and merge the code.

OK, I've just made a PR, will merge as soon as the tests finish. Added a note in the readme as well.