Missing kernel execution on Timeline + wrong Duration
PolarNick239 opened this issue · 2 comments
I have a simple project with implemented merge sort. There is three kernels: bitonic_local
, merge_mp
, merge
and get_merge_path
.
I observe this:
-
No
bitonic_local
in Kernel Execution in timeline, whilemerge_mp
,merge
andget_merge_path
are shown. -
Strange Duration(ms) timings for
bitonic_local
in Top10 Kernel Summary. -
Also in GPU: Performance Counters
bitonic_local
has the same strange duration in column Time.
Enviroment: Windows 10 + AMD R9 390X.
Driver: 18.5.1 and 18.9.3 (so while this issue seems to be similar to #62 - it reproduces in the freshest driver).
CodeXL: 2.5.67.0 and 2.6.361.0.
Compiler: Visual Studio 2017 Community.
Steps to reproduce:
- Get source code
- Compile like this:
c:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat
cd ...\path\to\project
mkdir build
cd build
cmake -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
ninja -j8
- Launch under GPU timeline tracing profiling in CodeXL file
...\path\to\project\build\merge.exe
(if you have more than one OpenCL device - specify its index via Command Line Arguments - see output of application for mappint of indices to OpenCL devices).
Sorry for the delayed reply here. Our OpenCL driver team investigated this issue and have responded that the incorrect timestamps are caused by a buffer overrun in the bitonic_local kernel. The kernel code that writes to the "as" parameter is writing past the end of the buffer by two floats. In GPU memory, the location for the kernel timestamps is immediately following the "as" buffer, so the overwrite ends up writing FLT_MAX into the location where the timestamps are written.
It looks like if you modify the code on line 40 of the kernel as follows everything works as expected:
Change:
if (global_i * 2 <= n) {
To:
if (global_i * 2 < n) {
Oh, my fault! Thank you and the driver team, especially for detailed explanation!