terminate called after throwing an instance of 'cl::sycl::compile_program_error'
abhiTronix opened this issue ยท 13 comments
Hi, I have ComputerCPP 1.0.5:
********************************************************************************
ComputeCpp Info (CE 1.0.5)
SYCL 1.2.1 revision 3
********************************************************************************
Toolchain information:
GLIBC version: 2.27
GLIBCXX: 20160609
This version of libstdc++ is supported.
********************************************************************************
Device Info:
Discovered 4 devices matching:
platform : <any>
device type : <any>
--------------------------------------------------------------------------------
Device 0:
Device is supported : UNTESTED - Untested OS
CL_DEVICE_NAME : Carrizo
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2766.4
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:
Device is supported : UNTESTED - Untested OS
CL_DEVICE_NAME : Iceland
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2766.4
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 2:
Device is supported : NO - Device does not support SPIR
CL_DEVICE_NAME : AMD Radeon R6 Graphics (CARRIZO, DRM 3.27.0, 4.15.0-45-generic, LLVM 7.0.0)
CL_DEVICE_VENDOR : AMD
CL_DRIVER_VERSION : 18.2.2
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 3:
Device is supported : NO - Device does not support SPIR
CL_DEVICE_NAME : AMD Radeon (TM) R7 M360 (ICELAND, DRM 3.27.0, 4.15.0-45-generic, LLVM 7.0.0)
CL_DEVICE_VENDOR : AMD
CL_DRIVER_VERSION : 18.2.2
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v1.0.5/platform-support-notes
********************************************************************************
and here is the output of clinfo(spir64 supported):
Number of platforms 2
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (2766.4)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Host timer resolution 1ns
Platform Extensions function suffix AMD
Platform Name Clover
Platform Vendor Mesa
Platform Version OpenCL 1.1 Mesa 18.2.2
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA
Platform Name AMD Accelerated Parallel Processing
Number of devices 2
Device Name Carrizo
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2766.4)
Driver Version 2766.4
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Board Name (AMD) AMD Radeon R6 Graphics
Device Topology (AMD) PCI-E, 00:01.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 6
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 800MHz
Graphics IP (AMD) 8.0
Device Partition (core)
Max number of sub-devices 6
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 256
Preferred work group size (AMD) 256
Max work group size (AMD) 1024
Preferred work group size multiple 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 4099133440 (3.818GiB)
Global free memory (AMD) 7784060 (7.423GiB)
Global memory channels (AMD) 2
Global memory banks per channel (AMD) 8
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 3924295680 (3.655GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type Read/Write
Global Memory cache size 16384 (16KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 32768 (32KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max number of constant args 8
Max constant buffer size 3924295680 (3.655GiB)
Preferred constant buffer size (AMD) 16384 (16KiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1549961647793167641ns (Tue Feb 12 14:24:07 2019)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) Yes
Number of async queues (AMD) 2
Max real-time compute queues (AMD) 0
Max real-time compute units (AMD) 0
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Device Name Iceland
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2766.4)
Driver Version 2766.4
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Board Name (AMD) AMD Radeon (TM) R7 M360
Device Topology (AMD) PCI-E, 04:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 6
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 1125MHz
Graphics IP (AMD) 8.0
Device Partition (core)
Max number of sub-devices 6
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 256
Preferred work group size (AMD) 256
Max work group size (AMD) 1024
Preferred work group size multiple 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 2145787904 (1.998GiB)
Global free memory (AMD) 2075528 (1.979GiB)
Global memory channels (AMD) 2
Global memory banks per channel (AMD) 8
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 1878712320 (1.75GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type Read/Write
Global Memory cache size 16384 (16KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 32768 (32KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max number of constant args 8
Max constant buffer size 1878712320 (1.75GiB)
Preferred constant buffer size (AMD) 16384 (16KiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1549961647793167641ns (Tue Feb 12 14:24:07 2019)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) Yes
Number of async queues (AMD) 2
Max real-time compute queues (AMD) 0
Max real-time compute units (AMD) 2415040557
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Platform Name Clover
Number of devices 2
Device Name AMD Radeon R6 Graphics (CARRIZO, DRM 3.27.0, 4.15.0-45-generic, LLVM 7.0.0)
Device Vendor AMD
Device Vendor ID 0x1002
Device Version OpenCL 1.1 Mesa 18.2.2
Driver Version 18.2.2
Device OpenCL C Version OpenCL C 1.1
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Max compute units 6
Max clock frequency 800MHz
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 64
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (cl_khr_fp16)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 7849488384 (7.31GiB)
Error Correction support No
Max memory allocation 5874880512 (5.471GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 32768 bits (4096 bytes)
Global Memory cache type None
Image support No
Local memory type Local
Local memory size 32768 (32KiB)
Max number of constant args 16
Max constant buffer size 2147483647 (2GiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Profiling timer resolution 0ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16
Device Name AMD Radeon (TM) R7 M360 (ICELAND, DRM 3.27.0, 4.15.0-45-generic, LLVM 7.0.0)
Device Vendor AMD
Device Vendor ID 0x1002
Device Version OpenCL 1.1 Mesa 18.2.2
Driver Version 18.2.2
Device OpenCL C Version OpenCL C 1.1
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Max compute units 6
Max clock frequency 1125MHz
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 64
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (cl_khr_fp16)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 7849488384 (7.31GiB)
Error Correction support No
Max memory allocation 5885116416 (5.481GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 32768 bits (4096 bytes)
Global Memory cache type None
Image support No
Local memory type Local
Local memory size 32768 (32KiB)
Max number of constant args 16
Max constant buffer size 2147483647 (2GiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Profiling timer resolution 0ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Carrizo
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (2)
Platform Name AMD Accelerated Parallel Processing
Device Name Carrizo
Device Name Iceland
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (2)
Platform Name AMD Accelerated Parallel Processing
Device Name Carrizo
Device Name Iceland
But fails to build any sample Computecpp example and instead throws cl::sycl::compile_program_error
. Here is the simple_vector_add example gdb output with backtrace(bt):
(gdb) run simple-vector-add
Starting program: /home/abhishek/computecpp-sdk/build/samples/simple-vector-add/simple-vector-add simple-vector-add
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5f14700 (LWP 11659)]
[New Thread 0x7ffff5713700 (LWP 11660)]
[New Thread 0x7ffff4f12700 (LWP 11661)]
[New Thread 0x7fffe61cd700 (LWP 11663)]
[New Thread 0x7fffe5049700 (LWP 11664)]
[New Thread 0x7fffe4848700 (LWP 11665)]
[New Thread 0x7fffd7fff700 (LWP 11666)]
[New Thread 0x7fffd77fe700 (LWP 11667)]
[New Thread 0x7fffd6ffd700 (LWP 11668)]
[New Thread 0x7fffd67fc700 (LWP 11669)]
[New Thread 0x7fffd5ffb700 (LWP 11670)]
[New Thread 0x7fffd56b9700 (LWP 11671)]
[New Thread 0x7fffd4eb8700 (LWP 11672)]
[New Thread 0x7fffb7fff700 (LWP 11673)]
[New Thread 0x7fffb77fe700 (LWP 11674)]
[New Thread 0x7fffb6ffd700 (LWP 11675)]
[New Thread 0x7fffb67fc700 (LWP 11676)]
[New Thread 0x7fffb5ffb700 (LWP 11677)]
[New Thread 0x7fffb57fa700 (LWP 11678)]
[New Thread 0x7fffb4ff9700 (LWP 11679)]
[New Thread 0x7fff97fff700 (LWP 11680)]
[Thread 0x7fff97fff700 (LWP 11680) exited]
[Thread 0x7fffb4ff9700 (LWP 11679) exited]
[Thread 0x7fffb57fa700 (LWP 11678) exited]
[Thread 0x7fffb5ffb700 (LWP 11677) exited]
[New Thread 0x7fffe6601700 (LWP 11681)]
terminate called after throwing an instance of 'cl::sycl::compile_program_error'
Thread 1 "simple-vector-a" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff6b25801 in __GI_abort () at abort.c:79
#2 0x00007ffff717a8b7 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff7180a06 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff7180a41 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff7180c74 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff75a38ff in void cl::sycl::detail::handle_sycl_log<cl::sycl::compile_program_error>(cl::sycl::detail::sycl_log&&) () from /usr/local/computecpp/lib/libComputeCpp.so
#7 0x00007ffff759bd94 in cl::sycl::detail::trigger_sycl_log(cl::sycl::log_type, char const*, int, int, cl::sycl::detail::cpp_error_code, cl::sycl::detail::context const*, char const*) ()
from /usr/local/computecpp/lib/libComputeCpp.so
#8 0x00007ffff7609c1a in cl::sycl::detail::program::handle_build_failure(int, cl::sycl::detail::cpp_error_code, cl::sycl::detail::program_state, std::shared_ptr<cl::sycl::detail::context> const&) ()
from /usr/local/computecpp/lib/libComputeCpp.so
#9 0x00007ffff760a9f8 in cl::sycl::detail::program::build_current_program(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /usr/local/computecpp/lib/libComputeCpp.so
#10 0x00007ffff760ace9 in cl::sycl::detail::program::build(unsigned char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /usr/local/computecpp/lib/libComputeCpp.so
#11 0x00007ffff75d37ae in cl::sycl::detail::context::create_program_for_binary(std::shared_ptr<cl::sycl::detail::context> const&, unsigned char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /usr/local/computecpp/lib/libComputeCpp.so
#12 0x00007ffff75d74d9 in cl::sycl::program::create_program_for_kernel_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /usr/local/computecpp/lib/libComputeCpp.so
#13 0x000055555555ba45 in cl::sycl::program cl::sycl::program::create_program_for_kernel<SimpleVadd<int> >(cl::sycl::context) ()
#14 0x000055555555a7c1 in void cl::sycl::handler::parallel_for_impl<SimpleVadd<int>, void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda(cl::sycl::id<1>)#1}>(cl::sycl::detail::index_array const&, cl::sycl::detail::index_array, void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda(cl::sycl::id<1>)#1} const&) ()
#15 0x0000555555559771 in void cl::sycl::handler::parallel_for<SimpleVadd<int>, void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda(cl::sycl::id<1>)#1}, 1>(cl::sycl::range<1> const&, void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda(cl::sycl::id<1>)#1} const&) ()
#16 0x0000555555558007 in void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const
()
#17 0x000055555555a9f7 in cl::sycl::event cl::sycl::detail::command_group::submit_handler<void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}>(void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}, std::shared_ptr<cl::sycl::detail::queue> const&, cl::sycl::detail::standard_handler_tag) ()
#18 0x000055555555984c in cl::sycl::event cl::sycl::queue::submit<void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}>(void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&)::{lambda(cl::sycl::handler&)#1}) ()
#19 0x000055555555846d in void simple_vadd<int, 4ul>(std::array<int, 4ul> const&, std::array<int, 4ul> const&, std::array<int, 4ul>&) ()
#20 0x000055555555702a in main ()
(gdb)
Any help is appreciated.
Here is the cmake output while building computecpp-sdk
:
-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- Check for working C compiler: /usr/lib/ccache/cc
-- Check for working C compiler: /usr/lib/ccache/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/lib/ccache/c++
-- Check for working CXX compiler: /usr/lib/ccache/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - found
-- Found OpenCL: /usr/lib/x86_64-linux-gnu/libOpenCL.so (found version "2.2")
-- platform - your system can support ComputeCpp
-- Found ComputeCpp: /usr/local/computecpp (found version "CE 1.0.5")
-- compute++ flags - -O2;-mllvm;-inline-threshold=1000;-intelspirmetadata;-sycl-target;spir64
-- Configuring done
-- Generating done
-- Build files have been written to: /home/abhishek/computecpp-sdk/build
I think AMD dropped SPIR
support for Rocm and Latest AMD Pro drivers Since only few of their clients needs that. Also only dummy SPIR
support can be seen in the above clinfo
output. Does anyone know what is the last supported Pro drivers with SPIR/SPIR-V support?.
Hi, it seems there is indeed an issue with SPIR in their latest driver.
I recommend that you keep using spir64 and install the driver we mention here: https://developer.codeplay.com/computecppce/latest/getting-started-with-tensorflow
@Rbiessy Yes AMD Pro 17.50 might be the appropriate driver but it yields following error:
Loading new amd gpu-17.50-511655 DKMS files...
Building for 4.15.0-45-generic
Building for architecture x86_64
Building initial module for 4.15.0-45-generic
Error! Bad return status for module build on kernel: 4.15.0-45-generic (x86_64)
Consult /var/lib/dkms/amdgpu/17.50-511655/build/make.log for more information.
As these drivers only supports Ubuntu 16.04.3 Kernel 4.10.XX-generic
,
In fact, the last kernel that 17.50 compiles fine under is 4.13.9 ..
but not my kernel on Ubuntu 18.04 i.e. 4.15.0-45-generic
, So still no luck in installation!. At this point, I can only install AMD Pro drivers (18.20 and later drivers only) but they don't support spir
. Also, ROCM drivers for tensorflow also not supported by my Kaveri APUs and Iceland GPUs. This is really frustrating :(
In the past, AMD dropped APPSDK
and now spir/spir-v
support for GPUs and only focusing on high-end graphics cards through rocm. This is the reason why NVIDIA is progressing so fast in this sector as they got proper support and appropriate drivers for their cards.
Hi @abhiTronix, it is difficult to find drivers that work in all cases, however the most success we've had is running the AMD GPUPRO drivers which report "2482.3" as the version in clinfo (somewhat older than what you were running earlier). This is running on an 18.04 system with an R9 Nano GPU. I'm afraid I don't have a download link to hand but it should be the same line of drivers as you had installed, just older. I hope this helps!
@DuncanMcBain Thanks for helping, Is this your card's clinfo
output?
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (2482.3)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
If yes, then the AMD driver version is amdgpu-pro-17.40.492261
. I haven't tried these yet, But both amdgpu-pro-17.50
and amdgpu-pro-18.10
failed to work on my system. Can you please confirm your system's kernel version? It will be awesome if it works somehow ๐ . Meanwhile, I'm preparing to Dual boot Ubuntu 16.04 on my machine to configure old AMD drivers.
This isn't actually my machine, I was asking around the office. I'll have to confirm tomorrow. That is certainly the version though!
Speaking to my colleague, apparently he's kept everything the same and is using the default kernel. Let us know if it works!
@DuncanMcBain I guess not, the default kernel version is 4.15.0-45-generic
on my Ubuntu 18.04.1 which incompatible with 18.20 or less AMD drivers, according to my tests and other forum results . Can you confirm the exact kernel version if possible. That will be helpful, uname -r
output to be more specific.
That is what it says on his PC too, so I don't know what to tell you!
That's weird? AMD supporting R9 nano but can't support R7. Anyways I'm working to get my drivers work on Ubuntu 16.04 OS.
So this driver won't install for you on 18.04 then? That's very strange, my colleague did have to try lots of different versions, but this was one that worked (and can run SPIR, he uses it as his development machine).
Solved by dual booting Ubuntu 16.04 with amd pro drivers version 17.50. Thanks for the support. Kudos ๐