terminate called after throwing an instance of 'cl::sycl::compile_program_error'

Question

terminate called after throwing an instance of 'cl::sycl::compile_program_error'

abhiTronix opened this issue 6 years ago · 13 comments

Hi, I have ComputerCPP 1.0.5:

********************************************************************************

ComputeCpp Info (CE 1.0.5)

SYCL 1.2.1 revision 3

********************************************************************************

Toolchain information:

GLIBC version: 2.27
GLIBCXX: 20160609
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 4 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : Carrizo
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2766.4
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 
--------------------------------------------------------------------------------
Device 1:

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : Iceland
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2766.4
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 
--------------------------------------------------------------------------------
Device 2:

  Device is supported                     : NO - Device does not support SPIR
  CL_DEVICE_NAME                          : AMD Radeon R6 Graphics (CARRIZO, DRM 3.27.0, 4.15.0-45-generic, LLVM 7.0.0)
  CL_DEVICE_VENDOR                        : AMD
  CL_DRIVER_VERSION                       : 18.2.2
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 
--------------------------------------------------------------------------------
Device 3:

  Device is supported                     : NO - Device does not support SPIR
  CL_DEVICE_NAME                          : AMD Radeon (TM) R7 M360 (ICELAND, DRM 3.27.0, 4.15.0-45-generic, LLVM 7.0.0)
  CL_DEVICE_VENDOR                        : AMD
  CL_DRIVER_VERSION                       : 18.2.2
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v1.0.5/platform-support-notes

********************************************************************************

and here is the output of clinfo(spir64 supported):

Number of platforms                               2
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (2766.4)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 18.2.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 2
  Device Name                                     Carrizo
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2766.4)
  Driver Version                                  2766.4
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Board Name (AMD)                         AMD Radeon R6 Graphics
  Device Topology (AMD)                           PCI-E, 00:01.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               6
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             800MHz
  Graphics IP (AMD)                               8.0
  Device Partition                                (core)
    Max number of sub-devices                     6
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4099133440 (3.818GiB)
  Global free memory (AMD)                        7784060 (7.423GiB)
  Global memory channels (AMD)                    2
  Global memory banks per channel (AMD)           8
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           3924295680 (3.655GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        3924295680 (3.655GiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1549961647793167641ns (Tue Feb 12 14:24:07 2019)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    Number of async queues (AMD)                  2
    Max real-time compute queues (AMD)            0
    Max real-time compute units (AMD)             0
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

  Device Name                                     Iceland
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2766.4)
  Driver Version                                  2766.4
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Board Name (AMD)                         AMD Radeon (TM) R7 M360
  Device Topology (AMD)                           PCI-E, 04:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               6
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1125MHz
  Graphics IP (AMD)                               8.0
  Device Partition                                (core)
    Max number of sub-devices                     6
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              2145787904 (1.998GiB)
  Global free memory (AMD)                        2075528 (1.979GiB)
  Global memory channels (AMD)                    2
  Global memory banks per channel (AMD)           8
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           1878712320 (1.75GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        1878712320 (1.75GiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1549961647793167641ns (Tue Feb 12 14:24:07 2019)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    Number of async queues (AMD)                  2
    Max real-time compute queues (AMD)            0
    Max real-time compute units (AMD)             2415040557
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

  Platform Name                                   Clover
Number of devices                                 2
  Device Name                                     AMD Radeon R6 Graphics (CARRIZO, DRM 3.27.0, 4.15.0-45-generic, LLVM 7.0.0)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 18.2.2
  Driver Version                                  18.2.2
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Max compute units                               6
  Max clock frequency                             800MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              7849488384 (7.31GiB)
  Error Correction support                        No
  Max memory allocation                           5874880512 (5.471GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max number of constant args                     16
  Max constant buffer size                        2147483647 (2GiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

  Device Name                                     AMD Radeon (TM) R7 M360 (ICELAND, DRM 3.27.0, 4.15.0-45-generic, LLVM 7.0.0)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 18.2.2
  Driver Version                                  18.2.2
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Max compute units                               6
  Max clock frequency                             1125MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              7849488384 (7.31GiB)
  Error Correction support                        No
  Max memory allocation                           5885116416 (5.481GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max number of constant args                     16
  Max constant buffer size                        2147483647 (2GiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Carrizo
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (2)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Carrizo
    Device Name                                   Iceland
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (2)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Carrizo
    Device Name                                   Iceland

But fails to build any sample Computecpp example and instead throws cl::sycl::compile_program_error. Here is the simple_vector_add example gdb output with backtrace(bt):

(gdb) run simple-vector-add
Starting program: /home/abhishek/computecpp-sdk/build/samples/simple-vector-add/simple-vector-add simple-vector-add
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5f14700 (LWP 11659)]
[New Thread 0x7ffff5713700 (LWP 11660)]
[New Thread 0x7ffff4f12700 (LWP 11661)]
[New Thread 0x7fffe61cd700 (LWP 11663)]
[New Thread 0x7fffe5049700 (LWP 11664)]
[New Thread 0x7fffe4848700 (LWP 11665)]
[New Thread 0x7fffd7fff700 (LWP 11666)]
[New Thread 0x7fffd77fe700 (LWP 11667)]
[New Thread 0x7fffd6ffd700 (LWP 11668)]
[New Thread 0x7fffd67fc700 (LWP 11669)]
[New Thread 0x7fffd5ffb700 (LWP 11670)]
[New Thread 0x7fffd56b9700 (LWP 11671)]
[New Thread 0x7fffd4eb8700 (LWP 11672)]
[New Thread 0x7fffb7fff700 (LWP 11673)]
[New Thread 0x7fffb77fe700 (LWP 11674)]
[New Thread 0x7fffb6ffd700 (LWP 11675)]
[New Thread 0x7fffb67fc700 (LWP 11676)]
[New Thread 0x7fffb5ffb700 (LWP 11677)]
[New Thread 0x7fffb57fa700 (LWP 11678)]
[New Thread 0x7fffb4ff9700 (LWP 11679)]
[New Thread 0x7fff97fff700 (LWP 11680)]
[Thread 0x7fff97fff700 (LWP 11680) exited]
[Thread 0x7fffb4ff9700 (LWP 11679) exited]
[Thread 0x7fffb57fa700 (LWP 11678) exited]
[Thread 0x7fffb5ffb700 (LWP 11677) exited]
[New Thread 0x7fffe6601700 (LWP 11681)]
terminate called after throwing an instance of 'cl::sycl::compile_program_error'

Thread 1 "simple-vector-a" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff6b25801 in __GI_abort () at abort.c:79
#2  0x00007ffff717a8b7 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff7180a06 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff7180a41 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff7180c74 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff75a38ff in void cl::sycl::detail::handle_sycl_log<cl::sycl::compile_program_error>(cl::sycl::detail::sycl_log&&) () from /usr/local/computecpp/lib/libComputeCpp.so
#7  0x00007ffff759bd94 in cl::sycl::detail::trigger_sycl_log(cl::sycl::log_type, char const*, int, int, cl::sycl::detail::cpp_error_code, cl::sycl::detail::context const*, char const*) ()
   from /usr/local/computecpp/lib/libComputeCpp.so
#8  0x00007ffff7609c1a in cl::sycl::detail::program::handle_build_failure(int, cl::sycl::detail::cpp_error_code, cl::sycl::detail::program_state, std::shared_ptr<cl::sycl::detail::context> const&) ()
   from /usr/local/computecpp/lib/libComputeCpp.so
#9  0x00007ffff760a9f8 in cl::sycl::detail::program::build_current_program(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /usr/local/computecpp/lib/libComputeCpp.so
#10 0x00007ffff760ace9 in cl::sycl::detail::program::build(unsigned char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /usr/local/computecpp/lib/libComputeCpp.so
#11 0x00007ffff75d37ae in cl::sycl::detail::context::create_program_for_binary(std::shared_ptr<cl::sycl::detail::context> const&, unsigned char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /usr/local/computecpp/lib/libComputeCpp.so
#12 0x00007ffff75d74d9 in cl::sycl::program::create_program_for_kernel_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /usr/local/computecpp/lib/libComputeCpp.so
#13 0x000055555555ba45 in cl::sycl::program cl::sycl::program::create_program_for_kernel<SimpleVadd<int> >(cl::sycl::context) ()
#14 0x000055555555a7c1 in void cl::sycl::handler::parallel_for_impl<SimpleVadd<int>, void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda(cl::sycl::id<1>)#1}>(cl::sycl::detail::index_array const&, cl::sycl::detail::index_array, void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda(cl::sycl::id<1>)#1} const&) ()
#15 0x0000555555559771 in void cl::sycl::handler::parallel_for<SimpleVadd<int>, void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda(cl::sycl::id<1>)#1}, 1>(cl::sycl::range<1> const&, void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda(cl::sycl::id<1>)#1} const&) ()
#16 0x0000555555558007 in void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const
    ()
#17 0x000055555555a9f7 in cl::sycl::event cl::sycl::detail::command_group::submit_handler<void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}>(void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}, std::shared_ptr<cl::sycl::detail::queue> const&, cl::sycl::detail::standard_handler_tag) ()
#18 0x000055555555984c in cl::sycl::event cl::sycl::queue::submit<void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}>(void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}) ()
#19 0x000055555555846d in void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&) ()
#20 0x000055555555702a in main ()
(gdb)

Any help is appreciated.

Answer 1 · 2019-02-13T02:34:33.000Z

Here is the cmake output while building computecpp-sdk:

-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- Check for working C compiler: /usr/lib/ccache/cc
-- Check for working C compiler: /usr/lib/ccache/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/lib/ccache/c++
-- Check for working CXX compiler: /usr/lib/ccache/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - found
-- Found OpenCL: /usr/lib/x86_64-linux-gnu/libOpenCL.so (found version "2.2") 
-- platform - your system can support ComputeCpp
-- Found ComputeCpp: /usr/local/computecpp (found version "CE 1.0.5") 
-- compute++ flags - -O2;-mllvm;-inline-threshold=1000;-intelspirmetadata;-sycl-target;spir64
-- Configuring done
-- Generating done
-- Build files have been written to: /home/abhishek/computecpp-sdk/build

Answer 2 · 2019-02-13T09:07:46.000Z

I think AMD dropped SPIR support for Rocm and Latest AMD Pro drivers Since only few of their clients needs that. Also only dummy SPIR support can be seen in the above clinfo output. Does anyone know what is the last supported Pro drivers with SPIR/SPIR-V support?.

Answer 3 · 2019-02-13T10:36:19.000Z

Hi, it seems there is indeed an issue with SPIR in their latest driver.
I recommend that you keep using spir64 and install the driver we mention here: https://developer.codeplay.com/computecppce/latest/getting-started-with-tensorflow

Answer 4 · 2019-02-13T13:37:14.000Z

@Rbiessy Yes AMD Pro 17.50 might be the appropriate driver but it yields following error:

Loading new amd gpu-17.50-511655 DKMS files...
Building for 4.15.0-45-generic
Building for architecture x86_64
Building initial module for 4.15.0-45-generic
Error! Bad return status for module build on kernel: 4.15.0-45-generic (x86_64)
Consult /var/lib/dkms/amdgpu/17.50-511655/build/make.log for more information.

As these drivers only supports Ubuntu 16.04.3 Kernel 4.10.XX-generic ,

In fact, the last kernel that 17.50 compiles fine under is 4.13.9 ..

but not my kernel on Ubuntu 18.04 i.e. 4.15.0-45-generic, So still no luck in installation!. At this point, I can only install AMD Pro drivers (18.20 and later drivers only) but they don't support spir. Also, ROCM drivers for tensorflow also not supported by my Kaveri APUs and Iceland GPUs. This is really frustrating :(

In the past, AMD dropped APPSDK and now spir/spir-v support for GPUs and only focusing on high-end graphics cards through rocm. This is the reason why NVIDIA is progressing so fast in this sector as they got proper support and appropriate drivers for their cards.

Answer 5 · 2019-02-13T17:02:34.000Z

Hi @abhiTronix, it is difficult to find drivers that work in all cases, however the most success we've had is running the AMD GPUPRO drivers which report "2482.3" as the version in clinfo (somewhat older than what you were running earlier). This is running on an 18.04 system with an R9 Nano GPU. I'm afraid I don't have a download link to hand but it should be the same line of drivers as you had installed, just older. I hope this helps!

Answer 6 · 2019-02-13T18:30:17.000Z

@DuncanMcBain Thanks for helping, Is this your card's clinfo output?

Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (2482.3)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.

If yes, then the AMD driver version is amdgpu-pro-17.40.492261. I haven't tried these yet, But both amdgpu-pro-17.50 and amdgpu-pro-18.10 failed to work on my system. Can you please confirm your system's kernel version? It will be awesome if it works somehow 👍 . Meanwhile, I'm preparing to Dual boot Ubuntu 16.04 on my machine to configure old AMD drivers.

Answer 7 · 2019-02-13T19:07:28.000Z

This isn't actually my machine, I was asking around the office. I'll have to confirm tomorrow. That is certainly the version though!

Answer 8 · 2019-02-14T11:24:20.000Z

Speaking to my colleague, apparently he's kept everything the same and is using the default kernel. Let us know if it works!

Answer 9 · 2019-02-14T11:42:36.000Z

@DuncanMcBain I guess not, the default kernel version is 4.15.0-45-generic on my Ubuntu 18.04.1 which incompatible with 18.20 or less AMD drivers, according to my tests and other forum results . Can you confirm the exact kernel version if possible. That will be helpful, uname -r output to be more specific.

Answer 10 · 2019-02-14T11:57:15.000Z

That is what it says on his PC too, so I don't know what to tell you!

Answer 11 · 2019-02-14T12:23:36.000Z

That's weird? AMD supporting R9 nano but can't support R7. Anyways I'm working to get my drivers work on Ubuntu 16.04 OS.

Answer 12 · 2019-02-14T12:27:40.000Z

So this driver won't install for you on 18.04 then? That's very strange, my colleague did have to try lots of different versions, but this was one that worked (and can run SPIR, he uses it as his development machine).

Answer 13 · 2019-02-14T15:10:35.000Z

Solved by dual booting Ubuntu 16.04 with amd pro drivers version 17.50. Thanks for the support. Kudos 👍