intel/opencl-intercept-layer

How to use DevicePerfCounterTimeBasedSampling?

mcleary opened this issue · 11 comments

Observed Behavior

Profiling my application with CLI_DevicePerfCounterTimeBasedSampling=1 gives me the message E:[MDAPI]: enable timer mode escape failed MDAPI Helper: OpenIoStream failed 42

Desired Behavior

I would like to be able to sample the performance counters of my Intel GPU when running my OpenCL application.

Steps to Reproduce

I am just running a small OpenCL application with the intercept layer. Is this because my device doesn't support this feature?

Here are the values for all controls I am using:

"CLI_ChromeCallLogging": 0,
"CLI_DevicePerformanceTimelineLogging": 1,
"CLI_DevicePerformanceTimeLogging": 1,
"CLI_DevicePerformanceTiming ": 1,
"CLI_DevicePerfCounterTimeBasedSampling": 1,
"CLI_CLInfoLogging": 1,
"CLI_DevicePerfCounterEventBasedSampling": 0,
"CLI_ReportToFile": 1,
"CLI_EventCallbackLogging": 1,
"CLI_HostPerformanceTiming": 1,
"CLI_DevicePerfCounterTiming": 1,
"CLI_DevicePerfCounterCustom": "ComputeExtended"
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CLIntercept (64-bit) is loading...
CLintercept file location: C:\Users\Thales\Desktop\syclone\out\build\x64-Debug\test\system\tiled_convolution\Debug\OpenCL.dll
CLIntercept URL: https://github.com/intel/opencl-intercept-layer
CLIntercept git description: v2.2.2-7-g08e6ba8
CLIntercept git refspec: refs/heads/master
CLInterecpt git hash: 08e6ba842d66643edc8b06f119316f732c28b604
CLIntercept optional features:
    cliloader(NOT supported)
    cliprof(NOT supported)
    kernel overrides(supported)
    ITT tracing(NOT supported)
    MDAPI(supported)
CLIntercept environment variable prefix: CLI_
CLIntercept registry key: SOFTWARE\INTEL\IGFX\CLINTERCEPT
Trying to load dispatch from: real_opencl.dll
Couldn't load library: real_opencl.dll
Trying to load dispatch from: C:\Windows/syswow64/opencl.dll
Couldn't load library: C:\Windows/syswow64/opencl.dll
Trying to load dispatch from: C:\Windows/system32/opencl.dll
Couldn't get exported function pointer to: clCreateBufferWithProperties
Couldn't get exported function pointer to: clCreateImageWithProperties
... success!
EventCallbackLogging is set to a non-default value!
CLInfoLogging is set to a non-default value!
HostPerformanceTiming is set to a non-default value!
DevicePerformanceTimeLogging is set to a non-default value!
DevicePerformanceTimelineLogging is set to a non-default value!
DevicePerfCounterTimeBasedSampling is set to a non-default value!
DevicePerfCounterCustom is set to a non-default value!
DevicePerfCounterTiming is set to a non-default value!
Metric Discovery initialized.
Timer Started!
... loading complete.
Running [tiled_convolution]. Executing on [intel:gpu].

Enumerated 2 platforms.

Platform 0:
        Name:           Intel(R) OpenCL
        Vendor:         Intel(R) Corporation
        Driver Version: OpenCL 2.1 WINDOWS
        Profile:        FULL_PROFILE
        Extensions:
                cl_khr_icd
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_int64_base_atomics
                cl_khr_int64_extended_atomics
                cl_khr_byte_addressable_store
                cl_khr_depth_images
                cl_khr_3d_image_writes
                cl_khr_il_program
                cl_intel_unified_shared_memory_preview
                cl_intel_exec_by_local_thread
                cl_intel_vec_len_hint
                cl_khr_spir
                cl_khr_fp64
                cl_khr_image2d_from_buffer
                17 Platform Extensions Found
        Platform has 1 device.

Device 0:
        Name:           Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
        Vendor:         Intel(R) Corporation
        Version:        OpenCL 2.1 (Build 0)
        Driver Version: 2020.10.3.0.04
        Type:           CL_DEVICE_TYPE_CPU
        Extensions:
                cl_khr_icd
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_int64_base_atomics
                cl_khr_int64_extended_atomics
                cl_khr_byte_addressable_store
                cl_khr_depth_images
                cl_khr_3d_image_writes
                cl_khr_il_program
                cl_intel_unified_shared_memory_preview
                cl_intel_exec_by_local_thread
                cl_intel_vec_len_hint
                cl_khr_spir
                cl_khr_fp64
                cl_khr_image2d_from_buffer
                17 Device Extensions Found

Platform 1:
        Name:           Intel(R) OpenCL
        Vendor:         Intel(R) Corporation
        Driver Version: OpenCL 2.1
        Profile:        FULL_PROFILE
        Extensions:
                cl_khr_3d_image_writes
                cl_khr_byte_addressable_store
                cl_khr_depth_images
                cl_khr_fp64
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_icd
                cl_khr_image2d_from_buffer
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_spir
                11 Platform Extensions Found
        Platform has 2 devices.

Device 0:
        Name:           Intel(R) HD Graphics 520
        Vendor:         Intel(R) Corporation
        Version:        OpenCL 2.1 NEO
        Driver Version: 26.20.100.7463
        Type:           CL_DEVICE_TYPE_GPU
        Extensions:
                cl_khr_3d_image_writes
                cl_khr_byte_addressable_store
                cl_khr_fp16
                cl_khr_depth_images
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_icd
                cl_khr_image2d_from_buffer
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_intel_subgroups
                cl_intel_required_subgroup_size
                cl_intel_subgroups_short
                cl_khr_spir
                cl_intel_accelerator
                cl_intel_media_block_io
                cl_intel_driver_diagnostics
                cl_khr_priority_hints
                cl_khr_throttle_hints
                cl_khr_create_command_queue
                cl_khr_fp64
                cl_khr_subgroups
                cl_khr_il_program
                cl_intel_spirv_device_side_avc_motion_estimation
                cl_intel_spirv_media_block_io
                cl_intel_spirv_subgroups
                cl_khr_spirv_no_integer_wrap_decoration
                cl_khr_mipmap_image
                cl_khr_mipmap_image_writes
                cl_intel_unified_shared_memory_preview
                cl_intel_planar_yuv
                cl_intel_packed_yuv
                cl_intel_motion_estimation
                cl_intel_device_side_avc_motion_estimation
                cl_intel_advanced_motion_estimation
                cl_khr_int64_base_atomics
                cl_khr_int64_extended_atomics
                cl_khr_gl_sharing
                cl_khr_gl_depth_images
                cl_khr_gl_event
                cl_khr_gl_msaa_sharing
                cl_intel_dx9_media_sharing
                cl_khr_dx9_media_sharing
                cl_khr_d3d10_sharing
                cl_khr_d3d11_sharing
                cl_intel_d3d11_nv12_media_sharing
                cl_intel_simultaneous_sharing
                47 Device Extensions Found

Device 1:
        Name:           Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
        Vendor:         Intel(R) Corporation
        Version:        OpenCL 2.1 (Build 0)
        Driver Version: 7.6.0.0814
        Type:           CL_DEVICE_TYPE_CPU
        Extensions:
                cl_khr_icd
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_khr_byte_addressable_store
                cl_khr_depth_images
                cl_khr_3d_image_writes
                cl_intel_exec_by_local_thread
                cl_khr_spir
                cl_khr_fp64
                cl_khr_image2d_from_buffer
                cl_intel_vec_len_hint
                13 Device Extensions Found

Thanks for the excellent report! I'm sorry this isn't working for you. Your device should support time based sampling.

A few initial things to check:

  1. Have you tried rebooting recently? I've noticed that very rarely time-based sampling can get stuck in an enabled state, then subsequent attempts to enable it results in an error. There are some ways to fix this without rebooting, but rebooting is the most reliable and simplest mechanism to recover.

  2. Does the account you are running with have Administrator privileges? I'm not 100% sure about Windows, but I do know that Linux time-based sampling requires elevated privileges because the metrics are collected system-wide.

Hi, thanks for the reply. Yes, I did reboot my machine recently, in fact, I just rebooted it to make sure. I also tried running with Administrator permissions but I got the same result. To remove my application as a variable in this equation, I did a test where I ran a simple OpenCL. The code I am using for this test can be found here.

And here is the output log I get from Visual Studio

'CLExample.exe' (Win32): Loaded 'C:\Users\Thales\Desktop\SimpleOpenCL\out\build\x64-Debug\CLExample.exe'. Symbols loaded.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\ntdll.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\kernel32.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\KernelBase.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Users\Thales\Desktop\SimpleOpenCL\out\build\x64-Debug\OpenCL.dll'. Symbols loaded.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\setupapi.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\msvcrt.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\cfgmgr32.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\ucrtbase.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\rpcrt4.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\bcrypt.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\shlwapi.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\combase.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\bcryptprimitives.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\gdi32.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\win32u.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\gdi32full.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\msvcp_win.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\user32.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\advapi32.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\sechost.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\msvcp140d.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\vcruntime140d.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\vcruntime140_1d.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\ucrtbased.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\imm32.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\SysWOW64\OpenCL.dll'. 
'CLExample.exe' (Win32): Unloaded 'C:\Windows\SysWOW64\OpenCL.dll'
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\OpenCL.dll'. Module was built without symbols.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\ole32.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\devobj.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\drvstore.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\wintrust.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\msasn1.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\crypt32.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_9307632b3abedc2b\igdmd64.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\shell32.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\dxgi.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\SHCore.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\windows.storage.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\DXCore.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\profapi.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\powrprof.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\umpdc.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\kernel.appcore.dll'. 
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\cryptsp.dll'. 
MDAPI Helper: MDAPI Headers: v1.5.96, MDAPI Lib: v1.5.152
MDAPI Helper: Looking for MetricSet: ComputeBasic, API: 1, Category: F
MDAPI Helper: Skipped Group: OcclusionQueryStats MetricSet: RenderedPixelsStats MetricCount: 1 API: 12E, Category: 1
MDAPI Helper: Skipped Group: OcclusionQueryStats MetricSet: RenderedFragmentsStats MetricCount: 1 API: 30, Category: 1
MDAPI Helper: Skipped Group: TimestampQuery MetricSet: GPUTimestamp MetricCount: 1 API: 12E, Category: 8
MDAPI Helper: Skipped Group: PipelineStatistics MetricSet: PipelineStats MetricCount: 11 API: 30, Category: 3
MDAPI Helper: Skipped Group: PipelineTimestamps MetricSet: PipelineTimestamps MetricCount: 4 API: 3C, Category: 1
MDAPI Helper: Skipped Group: MMIO_Regs MetricSet: McRequests MetricCount: 3 API: 13E, Category: 8
MDAPI Helper: Skipped Group: MMIO_Regs MetricSet: EDRAM traffic MetricCount: 6 API: 17E, Category: 8
MDAPI Helper: Skipped Group: OA MetricSet: RenderBasic MetricCount: 52 API: 37F, Category: 3
MDAPI Helper: Matched Group: OA MetricSet: ComputeBasic MetricCount: 39 API: 36D, Category: 3
MDAPI Helper: MetricsDiscoveryInit End
MDAPI Helper: Trying device maximum buffer size = 16777216 bytes.
E:[MDAPI]: enable timer mode escape failed MDAPI Helper: OpenIoStream failed 42
The thread 0xe88 has exited with code 0 (0x0).

Would you have any more clues on what could be wrong based on this report?

Interesting, upgrading my GPU driver to 27.20.100.8190 seems to do the trick

Actually I spoke too soon. It turns out that everything works file after a reboot, however, if I interrupt a debug session, I will get the same error and other applications will also stop working. Is there something I can do to make sure the intercept layer can be interrupted like that?

Here is a screen capture showing the problem inside Visual Studio. What happens in the video

https://www.youtube.com/watch?v=O4iBnyq9U9Y

  1. Machine just rebooted
  2. Visual Studio debugging session with the code found here
  3. Debugging session runs with no problems
  4. Debugging session starts again. Green message, everything is fine
  5. Debugging session is now interrupted
  6. Intercept layer stops working

Hello, thank you again for your excellent description!

I think I know what's happening, and in short you found a very reliable way to get into the "stuck" state I mentioned above. Basically, when Visual Studio ends the debugging session it doesn't call into DLLs to detach. As a result, the intercept layer shutdown routine doesn't get called, and MDAPI is not properly closed. So, for the next execution, MDAPI is already enabled, and the call to OpenIoStream fails.

I'm in contact with the MDAPI folks to see if there's a reliable way to reset MDAPI if it gets into this state. Stay tuned...

Excellent. Glad to help

Any news on this issue?

I've filed a public issue against the metrics library to track this issue:

intel/metrics-discovery#18

Thanks for the help

See comment on the issue filed above: intel/metrics-discovery#18 (comment)

This issue should no longer occur due to improvements within MDAPI. If you still encounter issues please let me know and I will try some experiments calling IAdapter_1_6::Reset() to get out of the "stuck" state - thanks!