doe300/VC4CL

[System-Error] application hang when launching a simple example (HelloWorld)

sunneo opened this issue · 6 comments

https://github.com/rsnemmen/OpenCL-examples/tree/master/Hello_World
To make this program launch,
I change the SIZE to 16
and let local become 4 because the workgroupSize 12 cannot divide 1024
after build with gcc -o hello.exe hello.c -lOpenCL
the test program stuck at launch kernel.

[VC4CL](      hello.exe): API call: void* clGetExtensionFunctionAddressForPlatform(cl_platform_id 0x5da5e4, const char* "clIcdGetPlatformIDsKHR")
[VC4CL](      hello.exe): get extension function address: clIcdGetPlatformIDsKHR
[VC4CL](      hello.exe): API call: void* clGetExtensionFunctionAddressForPlatform(cl_platform_id 0x5da5e4, const char* "clGetPlatformInfo")
[VC4CL](      hello.exe): get extension function address: clGetPlatformInfo
[VC4CL](      hello.exe): API call: cl_int clIcdGetPlatformIDsKHR(cl_uint 0, cl_platform_id* 0, cl_uint* 0xbee72d48)
[VC4CL](      hello.exe): API call: cl_int clIcdGetPlatformIDsKHR(cl_uint 1, cl_platform_id* 0x5d56b8, cl_uint* 0)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x5da5e4, cl_platform_info 2308, size_t 0, void* 0, size_t* 0xbee72ce0)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x5da5e4, cl_platform_info 2308, size_t 226, void* 0x5e6458, size_t* 0)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x5da5e4, cl_platform_info 2336, size_t 0, void* 0, size_t* 0xbee72ce0)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x5da5e4, cl_platform_info 2336, size_t 6, void* 0x5da630, size_t* 0)
[VC4CL](      hello.exe): API call: cl_int clGetDeviceIDs(cl_platform_id 0x5da5e4, cl_device_type 4, cl_uint 1, cl_device_id* 0xbee73634, cl_uint* 0)
[VC4CL](      hello.exe): API call: cl_context clCreateContext(const cl_context_properties* 0, cl_uint 1, const cl_device_id* 0xbee73634, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0xbee72d18, void* 0, cl_int* 0xbee75640)
[VC4CL](      hello.exe): Tracking live-time of object: 0x5e587c (cl_context)
[VC4CL](      hello.exe): API call: cl_command_queue clCreateCommandQueue(cl_context 0x5e587c, cl_device_id 0x5da5f8, cl_command_queue_properties 0, cl_int* 0xbee75640)
[VC4CL](      hello.exe): Starting queue handler thread...
[VC4CL](      hello.exe): Tracking live-time of object: 0x5e441c (cl_command_queue)
[VC4CL](      hello.exe): API call: cl_program clCreateProgramWithSource(cl_context 0x5e587c, cl_uint 1, const char** 0x22080, const size_t* 0, cl_int* 0xbee75640)
[VC4CL](      hello.exe): Tracking live-time of object: 0x5a7744 (cl_program)
[VC4CL](      hello.exe): API call: cl_int clBuildProgram(cl_program 0x5a7744, cl_uint 0, const cl_device_id* 0, const char* (null), void(CL_CALLBACK*)(cl_program program, void* user_data) 0xbee72e00, void* 0)
[VC4CL](      hello.exe): Precompiling source with:
Dumping program sources to /tmp/vc4cl-source-386839851.cl
[VC4CL](      hello.exe): Dumping program IR to /tmp/vc4cl-ir-771476364.ll
[VC4CL](      hello.exe): Precompilation complete with status: 0
[VC4CL](      hello.exe): [VC4CL] base=0x3fc00000, mem=0xb6f54000
[VC4CL](      hello.exe): [VC4CL] V3D base: 0xb6f54000
[VC4CL](      hello.exe): Compiling source with:
[VC4CL](      hello.exe): Compilation complete with status: 0
Dumping program binaries to /tmp/vc4cl-binary-942724790.bin
[VC4CL](      hello.exe): API call: cl_kernel clCreateKernel(cl_program 0x5a7744, const char* "square", cl_int* 0xbee75640)
[VC4CL](      hello.exe): Tracking live-time of object: 0x5a8324 (cl_kernel)
[VC4CL](      hello.exe): API call: cl_mem clCreateBuffer(cl_context 0x5e587c, cl_mem_flags 4, size_t 4096, void* 0, cl_int* 0)
[VC4CL](      hello.exe): [VC4CL] Mailbox file descriptor opened: 4
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 00030012 00000008 00000004 00000001 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 00030012 00000008 80000004 80000000 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 38000000 08000000 00000000
[VC4CL](      hello.exe): Mailbox request: succeeded
[VC4CL](      hello.exe): Tracking live-time of object: 0x5da69c (cl_mem)
[VC4CL](      hello.exe): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00001000 00001000 0000000c 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 80000004 00000004 00001000 0000000c 00000000
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 00000004 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 0003000d 00000008 80000004 be8f8000 00000000 00000000
[VC4CL](      hello.exe): [VC4CL] base=0x3e8f8000, mem=0xb6f53000
[VC4CL](      hello.exe): Allocated 4096 bytes of buffer: handle 4, device address 0xbe8f8000, host address 0xb6f53000
[VC4CL](      hello.exe): API call: cl_mem clCreateBuffer(cl_context 0x5e587c, cl_mem_flags 2, size_t 4096, void* 0, cl_int* 0)
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 38000000 08000000 00000000
[VC4CL](      hello.exe): Mailbox request: succeeded
[VC4CL](      hello.exe): Tracking live-time of object: 0x5da97c (cl_mem)
[VC4CL](      hello.exe): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00001000 00001000 0000000c 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 80000004 00000014 00001000 0000000c 00000000
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 00000014 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 0003000d 00000008 80000004 be8f4000 00000000 00000000
[VC4CL](      hello.exe): [VC4CL] base=0x3e8f4000, mem=0xb6f52000
[VC4CL](      hello.exe): Allocated 4096 bytes of buffer: handle 20, device address 0xbe8f4000, host address 0xb6f52000
[VC4CL](      hello.exe): API call: cl_int clEnqueueWriteBuffer(cl_command_queue 0x5e441c, cl_mem 0x5da69c, cl_bool 1, size_t 0, size_t 4096, void* 0xbee74640, cl_uint 0, const cl_event* 0, cl_event* 0)
[VC4CL](      hello.exe): Tracking live-time of object: 0x5d08cc (cl_event)
[VC4CL](      hello.exe): Releasing live-time of object: 0x5d08cc (cl_event)
[VC4CL](      hello.exe): API call: cl_int clSetKernelArg(cl_kernel 0x5a8324, cl_uint 0, size_t 4, const void* 0xbee73630)
[VC4CL](      hello.exe): Set kernel arg 0 for kernel 'square' to 0xbee73630 (6137500) with size 4
[VC4CL](      hello.exe): Kernel arg 0 for kernel 'square' is float* 'input' with size 4
[VC4CL](      hello.exe): Setting kernel-argument 0 to pointer 0x0x5da690
[VC4CL](      hello.exe): API call: cl_int clSetKernelArg(cl_kernel 0x5a8324, cl_uint 1, size_t 4, const void* 0xbee7362c)
[VC4CL](      hello.exe): Set kernel arg 1 for kernel 'square' to 0xbee7362c (6138236) with size 4
[VC4CL](      hello.exe): Kernel arg 1 for kernel 'square' is float* 'output' with size 4
[VC4CL](      hello.exe): Setting kernel-argument 1 to pointer 0x0x5da970
[VC4CL](      hello.exe): API call: cl_int clSetKernelArg(cl_kernel 0x5a8324, cl_uint 2, size_t 4, const void* 0xbee73628)
[VC4CL](      hello.exe): Set kernel arg 2 for kernel 'square' to 0xbee73628 (1024) with size 4
[VC4CL](      hello.exe): Kernel arg 2 for kernel 'square' is uint 'count' with size 4
[VC4CL](      hello.exe): Setting kernel-argument 2 to scalar 1024
[VC4CL](      hello.exe): API call: cl_int clGetKernelWorkGroupInfo(cl_kernel 0x5a8324, cl_device_id 0x5da5f8, cl_kernel_work_group_info 4528, size_t 4, void* 0xbee73638, size_t* 0)
[VC4CL](      hello.exe): API call: cl_int clEnqueueNDRangeKernel(cl_command_queue 0x5e441c, cl_kernel 0x5a8324, cl_uint 1, const size_t* 0, const size_t* 0xbee7363c, const size_t* 0xbee73638, cl_uint 0, const cl_event* 0, cl_event* 0)
[VC4CL](      hello.exe): Tracking live-time of object: 0x5d08cc (cl_event)
[VC4CL](      hello.exe): API call: cl_int clFinish(cl_command_queue 0x5e441c)
[VC4CL](VC4CL Queue Han): Running kernel 'square' with 341 instructions...
Local sizes: 4 1 1 -> 4 QPUs
Global sizes: 1024 1 1 -> 256 work-groups (all at once)
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00001000 00001000 0000000c 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 80000004 0000000f 00001000 0000000c 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 0000000f 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000d 00000008 80000004 be8f2000 00000000 00000000
[VC4CL](VC4CL Queue Han): [VC4CL] base=0x3e8f2000, mem=0xb6f51000
[VC4CL](VC4CL Queue Han): Allocated 4096 bytes of buffer: handle 15, device address 0xbe8f2000, host address 0xb6f51000
[VC4CL](VC4CL Queue Han): Reserving space for 12 stack-frames of 0 bytes each
[VC4CL](VC4CL Queue Han): Copied 2728 bytes of kernel code to device buffer
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 0(1024), 0(1), 0(1)
        Local IDs (sizes): 0(4), 0(1), 0(1)
        Group IDs (sizes): 0(256), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 7 to buffer 0xbe8f8000
[VC4CL](VC4CL Queue Han): Setting parameter 8 to buffer 0xbe8f4000
[VC4CL](VC4CL Queue Han): Setting parameter 9 to scalar 1024
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 1(1024), 0(1), 0(1)
        Local IDs (sizes): 1(4), 0(1), 0(1)
        Group IDs (sizes): 0(256), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 7 to buffer 0xbe8f8000
[VC4CL](VC4CL Queue Han): Setting parameter 8 to buffer 0xbe8f4000
[VC4CL](VC4CL Queue Han): Setting parameter 9 to scalar 1024
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 2(1024), 0(1), 0(1)
        Local IDs (sizes): 2(4), 0(1), 0(1)
        Group IDs (sizes): 0(256), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 7 to buffer 0xbe8f8000
[VC4CL](VC4CL Queue Han): Setting parameter 8 to buffer 0xbe8f4000
[VC4CL](VC4CL Queue Han): Setting parameter 9 to scalar 1024
[VC4CL](VC4CL Queue Han): Setting work-item infos:
        1 dimensions with offsets: 0, 0, 0
        Global IDs (sizes): 3(1024), 0(1), 0(1)
        Local IDs (sizes): 3(4), 0(1), 0(1)
        Group IDs (sizes): 0(256), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 7 to buffer 0xbe8f8000
[VC4CL](VC4CL Queue Han): Setting parameter 8 to buffer 0xbe8f4000
[VC4CL](VC4CL Queue Han): Setting parameter 9 to scalar 1024
[VC4CL](VC4CL Queue Han): 10 parameters set.
[VC4CL](VC4CL Queue Han): Dumping kernel buffer to /tmp/vc4cl-dump-square-1833488263.bin
[VC4CL](VC4CL Queue Han): Running work-group 0, 0, 0

^C


[VC4CL](      hello.exe): API call: void* clGetExtensionFunctionAddressForPlatform(cl_platform_id 0x1ec5e4, const char* "clIcdGetPlatformIDsKHR")
[VC4CL](      hello.exe): get extension function address: clIcdGetPlatformIDsKHR
[VC4CL](      hello.exe): API call: void* clGetExtensionFunctionAddressForPlatform(cl_platform_id 0x1ec5e4, const char* "clGetPlatformInfo")
[VC4CL](      hello.exe): get extension function address: clGetPlatformInfo
[VC4CL](      hello.exe): API call: cl_int clIcdGetPlatformIDsKHR(cl_uint 0, cl_platform_id* 0, cl_uint* 0xbe9fdcc8)
[VC4CL](      hello.exe): API call: cl_int clIcdGetPlatformIDsKHR(cl_uint 1, cl_platform_id* 0x1e76b8, cl_uint* 0)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x1ec5e4, cl_platform_info 2308, size_t 0, void* 0, size_t* 0xbe9fdc60)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x1ec5e4, cl_platform_info 2308, size_t 226, void* 0x1f8458, size_t* 0)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x1ec5e4, cl_platform_info 2336, size_t 0, void* 0, size_t* 0xbe9fdc60)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x1ec5e4, cl_platform_info 2336, size_t 6, void* 0x1ec630, size_t* 0)
[VC4CL](      hello.exe): API call: cl_int clGetDeviceIDs(cl_platform_id 0x1ec5e4, cl_device_type 4, cl_uint 1, cl_device_id* 0xbe9fe5b4, cl_uint* 0)
[VC4CL](      hello.exe): API call: cl_context clCreateContext(const cl_context_properties* 0, cl_uint 1, const cl_device_id* 0xbe9fe5b4, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0xbe9fdc98, void* 0, cl_int* 0xbe9fe640)
[VC4CL](      hello.exe): Tracking live-time of object: 0x1f787c (cl_context)
[VC4CL](      hello.exe): API call: cl_command_queue clCreateCommandQueue(cl_context 0x1f787c, cl_device_id 0x1ec5f8, cl_command_queue_properties 0, cl_int* 0xbe9fe640)
[VC4CL](      hello.exe): Starting queue handler thread...
[VC4CL](      hello.exe): Tracking live-time of object: 0x1f641c (cl_command_queue)
[VC4CL](      hello.exe): API call: cl_program clCreateProgramWithSource(cl_context 0x1f787c, cl_uint 1, const char** 0x22080, const size_t* 0, cl_int* 0xbe9fe640)
[VC4CL](      hello.exe): Tracking live-time of object: 0x1b9744 (cl_program)
[VC4CL](      hello.exe): API call: cl_int clBuildProgram(cl_program 0x1b9744, cl_uint 0, const cl_device_id* 0, const char* (null), void(CL_CALLBACK*)(cl_program program, void* user_data) 0xbe9fdd80, void* 0)
[VC4CL](      hello.exe): Precompiling source with:
Dumping program sources to /tmp/vc4cl-source-1365180540.cl
[VC4CL](      hello.exe): Dumping program IR to /tmp/vc4cl-ir-1540383426.ll
[VC4CL](      hello.exe): Precompilation complete with status: 0
[VC4CL](      hello.exe): [VC4CL] base=0x3fc00000, mem=0xb6fcf000
[VC4CL](      hello.exe): [VC4CL] V3D base: 0xb6fcf000
[VC4CL](      hello.exe): Compiling source with:
[VC4CL](      hello.exe): Compilation complete with status: 0
Dumping program binaries to /tmp/vc4cl-binary-304089172.bin
[VC4CL](      hello.exe): API call: cl_kernel clCreateKernel(cl_program 0x1b9744, const char* "square", cl_int* 0xbe9fe640)
[VC4CL](      hello.exe): Tracking live-time of object: 0x1ba324 (cl_kernel)
[VC4CL](      hello.exe): API call: cl_mem clCreateBuffer(cl_context 0x1f787c, cl_mem_flags 4, size_t 64, void* 0, cl_int* 0)
[VC4CL](      hello.exe): [VC4CL] Mailbox file descriptor opened: 4
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 00030012 00000008 00000004 00000001 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 00030012 00000008 80000004 80000000 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 38000000 08000000 00000000
[VC4CL](      hello.exe): Mailbox request: succeeded
[VC4CL](      hello.exe): Tracking live-time of object: 0x1ec69c (cl_mem)
[VC4CL](      hello.exe): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00000040 00001000 0000000c 00000000
[VC4CL](      hello.exe): ioctl_set_msg failed: -1
[VC4CL] Error in mbox_property: Connection timed out
terminate called after throwing an instance of 'std::system_error'
  what():  Failed to set mailbox property: Connection timed out
terminate called recursively
  • lsb_release
Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 10 (buster)
Release:        10
Codename:       buster
  • uname -a
Linux raspberrypi 5.4.70-v7+ #1 SMP Fri Oct 9 20:59:56 CST 2020 armv7l GNU/Linux
  • clinfo
pi@raspberrypi:~ $ sudo clinfo
Number of platforms                               1
  Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
  Platform Vendor                                 doe300
  Platform Version                                OpenCL 1.2 VC4CL 0.4.9999 (b7fef0a)
  Platform Profile                                EMBEDDED_PROFILE
  Platform Extensions                             cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_khr_extended_versioning cl_khr_spirv_no_integer_wrap_decoration cl_vc4cl_performance_counters
  Platform Extensions function suffix             VC4CL

  Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
Number of devices                                 1
  Device Name                                     VideoCore IV GPU
  Device Vendor                                   Broadcom
  Device Vendor ID                                0x14e4
  Device Version                                  OpenCL 1.2 VC4CL 0.4.9999 (b7fef0a)
  Driver Version                                  0.4.9999
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Profile                                  EMBEDDED_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               1
  Max clock frequency                             300MHz
  Core Temperature (Altera)                       41 C
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             12x12x12
  Max work group size                             12
  Preferred work group size multiple              1
  Preferred / native vector sizes
    char                                                16 / 16
    short                                               16 / 16
    int                                                 16 / 16
    long                                                 0 / 0
    half                                                 0 / 0        (n/a)
    float                                               16 / 16
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 Yes
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              134217728 (128MiB)
  Error Correction support                        No
  Max memory allocation                           134217728 (128MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             64 bytes
  Alignment of base address                       512 bits (64 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        32768 (32KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   No
  Local memory type                               Global
  Local memory size                               134217728 (128MiB)
  Max number of constant args                     32
  Max constant buffer size                        134217728 (128MiB)
  Max size of kernel argument                     256
  Queue properties
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    IL version                                    SPIR-V_1.5 SPIR_1.2
    SPIR versions                                 1.2
  printf() buffer size                            0
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_nv_pragma_unroll cl_arm_core_id cl_ext_atomic_counters_32 cl_khr_initialize_memory cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_int16 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_khr_extended_versioning cl_khr_spirv_no_integer_wrap_decoration cl_vc4cl_performance_counters

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  OpenCL for the Raspberry Pi VideoCore IV GPU
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [VC4CL]
  clCreateContext(NULL, ...) [default]            Success [VC4CL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 OpenCL for the Raspberry Pi VideoCore IV GPU
    Device Name                                   VideoCore IV GPU
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 OpenCL for the Raspberry Pi VideoCore IV GPU
    Device Name                                   VideoCore IV GPU
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 OpenCL for the Raspberry Pi VideoCore IV GPU
    Device Name                                   VideoCore IV GPU

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12

I change the SIZE to 16

Which size did you set to 16?

Also, how long did you approximately wait before terminating the process?

I change the SIZE to 16

Which size did you set to 16?
#define DATA_SIZE (16)

Also, how long did you approximately wait before terminating the process?
it's about 1 minute, because it takes too long.

I've rebooted and re-run it, and I got execution failed
this time I didn't terminate it. and I just wait for execution finish.

[VC4CL](      hello.exe): API call: void* clGetExtensionFunctionAddressForPlatform(cl_platform_id 0x194b5e4, const char* "clIcdGetPlatformIDsKHR")
[VC4CL](      hello.exe): get extension function address: clIcdGetPlatformIDsKHR
[VC4CL](      hello.exe): API call: void* clGetExtensionFunctionAddressForPlatform(cl_platform_id 0x194b5e4, const char* "clGetPlatformInfo")
[VC4CL](      hello.exe): get extension function address: clGetPlatformInfo
[VC4CL](      hello.exe): API call: cl_int clIcdGetPlatformIDsKHR(cl_uint 0, cl_platform_id* 0, cl_uint* 0xbeb57c88)
[VC4CL](      hello.exe): API call: cl_int clIcdGetPlatformIDsKHR(cl_uint 1, cl_platform_id* 0x19466b8, cl_uint* 0)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x194b5e4, cl_platform_info 2308, size_t 0, void* 0, size_t* 0xbeb57c20)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x194b5e4, cl_platform_info 2308, size_t 226, void* 0x1957458, size_t* 0)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x194b5e4, cl_platform_info 2336, size_t 0, void* 0, size_t* 0xbeb57c20)
[VC4CL](      hello.exe): API call: cl_int clGetPlatformInfo(cl_platform_id 0x194b5e4, cl_platform_info 2336, size_t 6, void* 0x194b630, size_t* 0)
[VC4CL](      hello.exe): API call: cl_int clGetDeviceIDs(cl_platform_id 0x194b5e4, cl_device_type 4, cl_uint 1, cl_device_id* 0xbeb58574, cl_uint* 0)
[VC4CL](      hello.exe): API call: cl_context clCreateContext(const cl_context_properties* 0, cl_uint 1, const cl_device_id* 0xbeb58574, void(CL_CALLBACK*)(const char* errinfo, const void* private_info, size_t cb, void* user_data) 0xbeb57c58, void* 0, cl_int* 0xbeb58600)
[VC4CL](      hello.exe): Tracking live-time of object: 0x195687c (cl_context)
[VC4CL](      hello.exe): API call: cl_command_queue clCreateCommandQueue(cl_context 0x195687c, cl_device_id 0x194b5f8, cl_command_queue_properties 0, cl_int* 0xbeb58600)
[VC4CL](      hello.exe): Starting queue handler thread...
[VC4CL](      hello.exe): Tracking live-time of object: 0x195541c (cl_command_queue)
[VC4CL](      hello.exe): API call: cl_program clCreateProgramWithSource(cl_context 0x195687c, cl_uint 1, const char** 0x22080, const size_t* 0, cl_int* 0xbeb58600)
[VC4CL](      hello.exe): Tracking live-time of object: 0x1918744 (cl_program)
[VC4CL](      hello.exe): API call: cl_int clBuildProgram(cl_program 0x1918744, cl_uint 0, const cl_device_id* 0, const char* (null), void(CL_CALLBACK*)(cl_program program, void* user_data) 0xbeb57d40, void* 0)
[VC4CL](      hello.exe): Precompiling source with: 
Dumping program sources to /tmp/vc4cl-source-1365180540.cl
[VC4CL](      hello.exe): Dumping program IR to /tmp/vc4cl-ir-1540383426.ll
[VC4CL](      hello.exe): Precompilation complete with status: 0
[VC4CL](      hello.exe): [VC4CL] base=0x3fc00000, mem=0xb6f85000
[VC4CL](      hello.exe): [VC4CL] V3D base: 0xb6f85000
[VC4CL](      hello.exe): Compiling source with: 
[VC4CL](      hello.exe): Compilation complete with status: 0
Dumping program binaries to /tmp/vc4cl-binary-304089172.bin
[VC4CL](      hello.exe): API call: cl_kernel clCreateKernel(cl_program 0x1918744, const char* "square", cl_int* 0xbeb58600)
[VC4CL](      hello.exe): Tracking live-time of object: 0x1919324 (cl_kernel)
[VC4CL](      hello.exe): API call: cl_mem clCreateBuffer(cl_context 0x195687c, cl_mem_flags 4, size_t 64, void* 0, cl_int* 0)
[VC4CL](      hello.exe): [VC4CL] Mailbox file descriptor opened: 4
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 00030012 00000008 00000004 00000001 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 00030012 00000008 80000004 80000000 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 38000000 08000000 00000000
[VC4CL](      hello.exe): Mailbox request: succeeded
[VC4CL](      hello.exe): Tracking live-time of object: 0x194b69c (cl_mem)
[VC4CL](      hello.exe): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00000040 00001000 0000000c 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 80000004 00000012 00001000 0000000c 00000000
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 00000012 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 0003000d 00000008 80000004 be6f9000 00000000 00000000
[VC4CL](      hello.exe): [VC4CL] base=0x3e6f9000, mem=0xb6f84000
[VC4CL](      hello.exe): Allocated 64 bytes of buffer: handle 18, device address 0xbe6f9000, host address 0xb6f84000
[VC4CL](      hello.exe): API call: cl_mem clCreateBuffer(cl_context 0x195687c, cl_mem_flags 2, size_t 64, void* 0, cl_int* 0)
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 00010006 00000008 00000000 00000000 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 00010006 00000008 80000008 38000000 08000000 00000000
[VC4CL](      hello.exe): Mailbox request: succeeded
[VC4CL](      hello.exe): Tracking live-time of object: 0x194b97c (cl_mem)
[VC4CL](      hello.exe): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00000040 00001000 0000000c 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 80000004 00000014 00001000 0000000c 00000000
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 00000014 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 0003000d 00000008 80000004 be6f8000 00000000 00000000
[VC4CL](      hello.exe): [VC4CL] base=0x3e6f8000, mem=0xb6f83000
[VC4CL](      hello.exe): Allocated 64 bytes of buffer: handle 20, device address 0xbe6f8000, host address 0xb6f83000
[VC4CL](      hello.exe): API call: cl_int clEnqueueWriteBuffer(cl_command_queue 0x195541c, cl_mem 0x194b69c, cl_bool 1, size_t 0, size_t 64, void* 0xbeb585c0, cl_uint 0, const cl_event* 0, cl_event* 0)
[VC4CL](      hello.exe): Tracking live-time of object: 0x19418cc (cl_event)
[VC4CL](      hello.exe): Releasing live-time of object: 0x19418cc (cl_event)
[VC4CL](      hello.exe): API call: cl_int clSetKernelArg(cl_kernel 0x1919324, cl_uint 0, size_t 4, const void* 0xbeb58570)
[VC4CL](      hello.exe): Set kernel arg 0 for kernel 'square' to 0xbeb58570 (26523292) with size 4
[VC4CL](      hello.exe): Kernel arg 0 for kernel 'square' is float* 'input' with size 4
[VC4CL](      hello.exe): Setting kernel-argument 0 to pointer 0x0x194b690
[VC4CL](      hello.exe): API call: cl_int clSetKernelArg(cl_kernel 0x1919324, cl_uint 1, size_t 4, const void* 0xbeb5856c)
[VC4CL](      hello.exe): Set kernel arg 1 for kernel 'square' to 0xbeb5856c (26524028) with size 4
[VC4CL](      hello.exe): Kernel arg 1 for kernel 'square' is float* 'output' with size 4
[VC4CL](      hello.exe): Setting kernel-argument 1 to pointer 0x0x194b970
[VC4CL](      hello.exe): API call: cl_int clSetKernelArg(cl_kernel 0x1919324, cl_uint 2, size_t 4, const void* 0xbeb58568)
[VC4CL](      hello.exe): Set kernel arg 2 for kernel 'square' to 0xbeb58568 (16) with size 4
[VC4CL](      hello.exe): Kernel arg 2 for kernel 'square' is uint 'count' with size 4
[VC4CL](      hello.exe): Setting kernel-argument 2 to scalar 16
[VC4CL](      hello.exe): API call: cl_int clGetKernelWorkGroupInfo(cl_kernel 0x1919324, cl_device_id 0x194b5f8, cl_kernel_work_group_info 4528, size_t 4, void* 0xbeb58578, size_t* 0)
[VC4CL](      hello.exe): API call: cl_int clEnqueueNDRangeKernel(cl_command_queue 0x195541c, cl_kernel 0x1919324, cl_uint 1, const size_t* 0, const size_t* 0xbeb5857c, const size_t* 0xbeb58578, cl_uint 0, const cl_event* 0, cl_event* 0)
[VC4CL](      hello.exe): Tracking live-time of object: 0x19418cc (cl_event)
[VC4CL](      hello.exe): API call: cl_int clFinish(cl_command_queue 0x195541c)
[VC4CL](VC4CL Queue Han): Running kernel 'square' with 341 instructions...
Local sizes: 4 1 1 -> 4 QPUs
Global sizes: 16 1 1 -> 4 work-groups (all at once)
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000024 00000000 0003000c 0000000c 0000000c 00001000 00001000 0000000c 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000024 80000000 0003000c 0000000c 80000004 00000013 00001000 0000000c 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000d 00000008 00000004 00000013 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000d 00000008 80000004 be6f4000 00000000 00000000
[VC4CL](VC4CL Queue Han): [VC4CL] base=0x3e6f4000, mem=0xb6f82000
[VC4CL](VC4CL Queue Han): Allocated 4096 bytes of buffer: handle 19, device address 0xbe6f4000, host address 0xb6f82000
[VC4CL](VC4CL Queue Han): Reserving space for 12 stack-frames of 0 bytes each
[VC4CL](VC4CL Queue Han): Copied 2728 bytes of kernel code to device buffer
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 0(16), 0(1), 0(1)
	Local IDs (sizes): 0(4), 0(1), 0(1)
	Group IDs (sizes): 0(4), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 7 to buffer 0xbe6f9000
[VC4CL](VC4CL Queue Han): Setting parameter 8 to buffer 0xbe6f8000
[VC4CL](VC4CL Queue Han): Setting parameter 9 to scalar 16
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 1(16), 0(1), 0(1)
	Local IDs (sizes): 1(4), 0(1), 0(1)
	Group IDs (sizes): 0(4), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 7 to buffer 0xbe6f9000
[VC4CL](VC4CL Queue Han): Setting parameter 8 to buffer 0xbe6f8000
[VC4CL](VC4CL Queue Han): Setting parameter 9 to scalar 16
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 2(16), 0(1), 0(1)
	Local IDs (sizes): 2(4), 0(1), 0(1)
	Group IDs (sizes): 0(4), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 7 to buffer 0xbe6f9000
[VC4CL](VC4CL Queue Han): Setting parameter 8 to buffer 0xbe6f8000
[VC4CL](VC4CL Queue Han): Setting parameter 9 to scalar 16
[VC4CL](VC4CL Queue Han): Setting work-item infos:
	1 dimensions with offsets: 0, 0, 0
	Global IDs (sizes): 3(16), 0(1), 0(1)
	Local IDs (sizes): 3(4), 0(1), 0(1)
	Group IDs (sizes): 0(4), 0(1), 0(1)
[VC4CL](VC4CL Queue Han): Setting parameter 7 to buffer 0xbe6f9000
[VC4CL](VC4CL Queue Han): Setting parameter 8 to buffer 0xbe6f8000
[VC4CL](VC4CL Queue Han): Setting parameter 9 to scalar 16
[VC4CL](VC4CL Queue Han): 10 parameters set.
[VC4CL](VC4CL Queue Han): Dumping kernel buffer to /tmp/vc4cl-dump-square-1303455736.bin
[VC4CL](VC4CL Queue Han): Running work-group 0, 0, 0
[VC4CL](VC4CL Queue Han): Execution: failed
[VC4CL](VC4CL Queue Han): [VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000e 00000008 00000004 00000013 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000e 00000008 80000004 00000000 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer before: 00000020 00000000 0003000f 00000008 00000004 00000013 00000000 00000000
[VC4CL](VC4CL Queue Han): Mailbox buffer after: 00000020 80000000 0003000f 00000008 80000004 00000000 00000000 00000000
[VC4CL](VC4CL Queue Han): Deallocated 4096 bytes of buffer: handle 19, device address 0xbe6f4000, host address 0xb6f82000
[VC4CL](      hello.exe): Error in '/home/pi/opencl/VC4CL/src/CommandQueue.cpp:113', returning status -5
[VC4CL](      hello.exe): Releasing live-time of object: 0x19418cc (cl_event)
[VC4CL](      hello.exe): API call: cl_int clEnqueueReadBuffer(cl_command_queue 0x195541c, cl_mem 0x194b97c, cl_bool 1, size_t 0, size_t 64, void* 0xbeb58580, cl_uint 0, const cl_event* 0, cl_event* 0)
[VC4CL](      hello.exe): Tracking live-time of object: 0x19418cc (cl_event)
[VC4CL](      hello.exe): Releasing live-time of object: 0x19418cc (cl_event)
Computed '0/16' correct values!
[VC4CL](      hello.exe): API call: cl_int clReleaseMemObject(cl_mem 0x194b69c)
[VC4CL](      hello.exe): Releasing live-time of object: 0x194b69c (cl_mem)
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 0003000e 00000008 00000004 00000012 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 0003000e 00000008 80000004 00000000 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 0003000f 00000008 00000004 00000012 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 0003000f 00000008 80000004 00000000 00000000 00000000
[VC4CL](      hello.exe): Deallocated 64 bytes of buffer: handle 18, device address 0xbe6f9000, host address 0xb6f84000
[VC4CL](      hello.exe): API call: cl_int clReleaseMemObject(cl_mem 0x194b97c)
[VC4CL](      hello.exe): Releasing live-time of object: 0x194b97c (cl_mem)
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 0003000e 00000008 00000004 00000014 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 0003000e 00000008 80000004 00000000 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 0003000f 00000008 00000004 00000014 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 0003000f 00000008 80000004 00000000 00000000 00000000
[VC4CL](      hello.exe): Deallocated 64 bytes of buffer: handle 20, device address 0xbe6f8000, host address 0xb6f83000
[VC4CL](      hello.exe): API call: cl_int clReleaseProgram(cl_program 0x1918744)
[VC4CL](      hello.exe): API call: cl_int clReleaseKernel(cl_kernel 0x1919324)
[VC4CL](      hello.exe): Releasing live-time of object: 0x1919324 (cl_kernel)
[VC4CL](      hello.exe): Releasing live-time of object: 0x1918744 (cl_program)
[VC4CL](      hello.exe): API call: cl_int clReleaseCommandQueue(cl_command_queue 0x195541c)
[VC4CL](      hello.exe): Releasing live-time of object: 0x195541c (cl_command_queue)
[VC4CL](      hello.exe): API call: cl_int clReleaseContext(cl_context 0x195687c)
[VC4CL](      hello.exe): Releasing live-time of object: 0x195687c (cl_context)
[VC4CL](      hello.exe): Mailbox buffer before: 00000020 00000000 00030012 00000008 00000004 00000000 00000000 00000000
[VC4CL](      hello.exe): Mailbox buffer after: 00000020 80000000 00030012 00000008 80000004 80000000 00000000 00000000
[VC4CL](      hello.exe): [VC4CL] Mailbox file descriptor closed: 4
[VC4CL](      hello.exe): Stopping queue handler thread...
[VC4CL](VC4CL Queue Han): Queue handler thread stopped

It looks like the compiler generates invalid code here which hangs/runs in some infinte loop on the VC4 hardware. I will have to check the generated code to see, what goes wrong there.

generated.tar.gz

I've upload compressed file to mega,
in which includes generated files (ll, cl, bin, dot) comes from /tmp/vc4*

I can't see anything wrong with the generated code...

Did you compile the VC4CL project with the CMake configuration -DREGISTER_POKE_KERNELS=ON or use the debian package emitted by the CI?

Also, do you use the open source VC4 KMS driver or the closed-source OpenGL stack? See also #51 and #60

YES, You got me, thanks

I realized that my raspberry pi was configured with raspi-config to use FKMS (Fake-KMS) for Gaming.
After I switch it to legacy one (without GL), this problem won't occurred.

I'll close this issue