How to use DevicePerfCounterTimeBasedSampling?
mcleary opened this issue · 11 comments
Observed Behavior
Profiling my application with CLI_DevicePerfCounterTimeBasedSampling=1
gives me the message E:[MDAPI]: enable timer mode escape failed MDAPI Helper: OpenIoStream failed 42
Desired Behavior
I would like to be able to sample the performance counters of my Intel GPU when running my OpenCL application.
Steps to Reproduce
I am just running a small OpenCL application with the intercept layer. Is this because my device doesn't support this feature?
Here are the values for all controls I am using:
"CLI_ChromeCallLogging": 0,
"CLI_DevicePerformanceTimelineLogging": 1,
"CLI_DevicePerformanceTimeLogging": 1,
"CLI_DevicePerformanceTiming ": 1,
"CLI_DevicePerfCounterTimeBasedSampling": 1,
"CLI_CLInfoLogging": 1,
"CLI_DevicePerfCounterEventBasedSampling": 0,
"CLI_ReportToFile": 1,
"CLI_EventCallbackLogging": 1,
"CLI_HostPerformanceTiming": 1,
"CLI_DevicePerfCounterTiming": 1,
"CLI_DevicePerfCounterCustom": "ComputeExtended"
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
CLIntercept (64-bit) is loading...
CLintercept file location: C:\Users\Thales\Desktop\syclone\out\build\x64-Debug\test\system\tiled_convolution\Debug\OpenCL.dll
CLIntercept URL: https://github.com/intel/opencl-intercept-layer
CLIntercept git description: v2.2.2-7-g08e6ba8
CLIntercept git refspec: refs/heads/master
CLInterecpt git hash: 08e6ba842d66643edc8b06f119316f732c28b604
CLIntercept optional features:
cliloader(NOT supported)
cliprof(NOT supported)
kernel overrides(supported)
ITT tracing(NOT supported)
MDAPI(supported)
CLIntercept environment variable prefix: CLI_
CLIntercept registry key: SOFTWARE\INTEL\IGFX\CLINTERCEPT
Trying to load dispatch from: real_opencl.dll
Couldn't load library: real_opencl.dll
Trying to load dispatch from: C:\Windows/syswow64/opencl.dll
Couldn't load library: C:\Windows/syswow64/opencl.dll
Trying to load dispatch from: C:\Windows/system32/opencl.dll
Couldn't get exported function pointer to: clCreateBufferWithProperties
Couldn't get exported function pointer to: clCreateImageWithProperties
... success!
EventCallbackLogging is set to a non-default value!
CLInfoLogging is set to a non-default value!
HostPerformanceTiming is set to a non-default value!
DevicePerformanceTimeLogging is set to a non-default value!
DevicePerformanceTimelineLogging is set to a non-default value!
DevicePerfCounterTimeBasedSampling is set to a non-default value!
DevicePerfCounterCustom is set to a non-default value!
DevicePerfCounterTiming is set to a non-default value!
Metric Discovery initialized.
Timer Started!
... loading complete.
Running [tiled_convolution]. Executing on [intel:gpu].
Enumerated 2 platforms.
Platform 0:
Name: Intel(R) OpenCL
Vendor: Intel(R) Corporation
Driver Version: OpenCL 2.1 WINDOWS
Profile: FULL_PROFILE
Extensions:
cl_khr_icd
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_int64_base_atomics
cl_khr_int64_extended_atomics
cl_khr_byte_addressable_store
cl_khr_depth_images
cl_khr_3d_image_writes
cl_khr_il_program
cl_intel_unified_shared_memory_preview
cl_intel_exec_by_local_thread
cl_intel_vec_len_hint
cl_khr_spir
cl_khr_fp64
cl_khr_image2d_from_buffer
17 Platform Extensions Found
Platform has 1 device.
Device 0:
Name: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
Vendor: Intel(R) Corporation
Version: OpenCL 2.1 (Build 0)
Driver Version: 2020.10.3.0.04
Type: CL_DEVICE_TYPE_CPU
Extensions:
cl_khr_icd
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_int64_base_atomics
cl_khr_int64_extended_atomics
cl_khr_byte_addressable_store
cl_khr_depth_images
cl_khr_3d_image_writes
cl_khr_il_program
cl_intel_unified_shared_memory_preview
cl_intel_exec_by_local_thread
cl_intel_vec_len_hint
cl_khr_spir
cl_khr_fp64
cl_khr_image2d_from_buffer
17 Device Extensions Found
Platform 1:
Name: Intel(R) OpenCL
Vendor: Intel(R) Corporation
Driver Version: OpenCL 2.1
Profile: FULL_PROFILE
Extensions:
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_depth_images
cl_khr_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_icd
cl_khr_image2d_from_buffer
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_spir
11 Platform Extensions Found
Platform has 2 devices.
Device 0:
Name: Intel(R) HD Graphics 520
Vendor: Intel(R) Corporation
Version: OpenCL 2.1 NEO
Driver Version: 26.20.100.7463
Type: CL_DEVICE_TYPE_GPU
Extensions:
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_fp16
cl_khr_depth_images
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_icd
cl_khr_image2d_from_buffer
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_intel_subgroups
cl_intel_required_subgroup_size
cl_intel_subgroups_short
cl_khr_spir
cl_intel_accelerator
cl_intel_media_block_io
cl_intel_driver_diagnostics
cl_khr_priority_hints
cl_khr_throttle_hints
cl_khr_create_command_queue
cl_khr_fp64
cl_khr_subgroups
cl_khr_il_program
cl_intel_spirv_device_side_avc_motion_estimation
cl_intel_spirv_media_block_io
cl_intel_spirv_subgroups
cl_khr_spirv_no_integer_wrap_decoration
cl_khr_mipmap_image
cl_khr_mipmap_image_writes
cl_intel_unified_shared_memory_preview
cl_intel_planar_yuv
cl_intel_packed_yuv
cl_intel_motion_estimation
cl_intel_device_side_avc_motion_estimation
cl_intel_advanced_motion_estimation
cl_khr_int64_base_atomics
cl_khr_int64_extended_atomics
cl_khr_gl_sharing
cl_khr_gl_depth_images
cl_khr_gl_event
cl_khr_gl_msaa_sharing
cl_intel_dx9_media_sharing
cl_khr_dx9_media_sharing
cl_khr_d3d10_sharing
cl_khr_d3d11_sharing
cl_intel_d3d11_nv12_media_sharing
cl_intel_simultaneous_sharing
47 Device Extensions Found
Device 1:
Name: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
Vendor: Intel(R) Corporation
Version: OpenCL 2.1 (Build 0)
Driver Version: 7.6.0.0814
Type: CL_DEVICE_TYPE_CPU
Extensions:
cl_khr_icd
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
cl_khr_depth_images
cl_khr_3d_image_writes
cl_intel_exec_by_local_thread
cl_khr_spir
cl_khr_fp64
cl_khr_image2d_from_buffer
cl_intel_vec_len_hint
13 Device Extensions Found
Thanks for the excellent report! I'm sorry this isn't working for you. Your device should support time based sampling.
A few initial things to check:
-
Have you tried rebooting recently? I've noticed that very rarely time-based sampling can get stuck in an enabled state, then subsequent attempts to enable it results in an error. There are some ways to fix this without rebooting, but rebooting is the most reliable and simplest mechanism to recover.
-
Does the account you are running with have Administrator privileges? I'm not 100% sure about Windows, but I do know that Linux time-based sampling requires elevated privileges because the metrics are collected system-wide.
Hi, thanks for the reply. Yes, I did reboot my machine recently, in fact, I just rebooted it to make sure. I also tried running with Administrator permissions but I got the same result. To remove my application as a variable in this equation, I did a test where I ran a simple OpenCL. The code I am using for this test can be found here.
And here is the output log I get from Visual Studio
'CLExample.exe' (Win32): Loaded 'C:\Users\Thales\Desktop\SimpleOpenCL\out\build\x64-Debug\CLExample.exe'. Symbols loaded.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\ntdll.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\kernel32.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\KernelBase.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Users\Thales\Desktop\SimpleOpenCL\out\build\x64-Debug\OpenCL.dll'. Symbols loaded.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\setupapi.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\msvcrt.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\cfgmgr32.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\ucrtbase.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\rpcrt4.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\bcrypt.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\shlwapi.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\combase.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\bcryptprimitives.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\gdi32.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\win32u.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\gdi32full.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\msvcp_win.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\user32.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\advapi32.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\sechost.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\msvcp140d.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\vcruntime140d.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\vcruntime140_1d.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\ucrtbased.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\imm32.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\SysWOW64\OpenCL.dll'.
'CLExample.exe' (Win32): Unloaded 'C:\Windows\SysWOW64\OpenCL.dll'
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\OpenCL.dll'. Module was built without symbols.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\ole32.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\devobj.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\drvstore.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\wintrust.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\msasn1.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\crypt32.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_9307632b3abedc2b\igdmd64.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\shell32.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\dxgi.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\SHCore.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\windows.storage.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\DXCore.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\profapi.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\powrprof.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\umpdc.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\kernel.appcore.dll'.
'CLExample.exe' (Win32): Loaded 'C:\Windows\System32\cryptsp.dll'.
MDAPI Helper: MDAPI Headers: v1.5.96, MDAPI Lib: v1.5.152
MDAPI Helper: Looking for MetricSet: ComputeBasic, API: 1, Category: F
MDAPI Helper: Skipped Group: OcclusionQueryStats MetricSet: RenderedPixelsStats MetricCount: 1 API: 12E, Category: 1
MDAPI Helper: Skipped Group: OcclusionQueryStats MetricSet: RenderedFragmentsStats MetricCount: 1 API: 30, Category: 1
MDAPI Helper: Skipped Group: TimestampQuery MetricSet: GPUTimestamp MetricCount: 1 API: 12E, Category: 8
MDAPI Helper: Skipped Group: PipelineStatistics MetricSet: PipelineStats MetricCount: 11 API: 30, Category: 3
MDAPI Helper: Skipped Group: PipelineTimestamps MetricSet: PipelineTimestamps MetricCount: 4 API: 3C, Category: 1
MDAPI Helper: Skipped Group: MMIO_Regs MetricSet: McRequests MetricCount: 3 API: 13E, Category: 8
MDAPI Helper: Skipped Group: MMIO_Regs MetricSet: EDRAM traffic MetricCount: 6 API: 17E, Category: 8
MDAPI Helper: Skipped Group: OA MetricSet: RenderBasic MetricCount: 52 API: 37F, Category: 3
MDAPI Helper: Matched Group: OA MetricSet: ComputeBasic MetricCount: 39 API: 36D, Category: 3
MDAPI Helper: MetricsDiscoveryInit End
MDAPI Helper: Trying device maximum buffer size = 16777216 bytes.
E:[MDAPI]: enable timer mode escape failed MDAPI Helper: OpenIoStream failed 42
The thread 0xe88 has exited with code 0 (0x0).
Would you have any more clues on what could be wrong based on this report?
Interesting, upgrading my GPU driver to 27.20.100.8190
seems to do the trick
Actually I spoke too soon. It turns out that everything works file after a reboot, however, if I interrupt a debug session, I will get the same error and other applications will also stop working. Is there something I can do to make sure the intercept layer can be interrupted like that?
Here is a screen capture showing the problem inside Visual Studio. What happens in the video
https://www.youtube.com/watch?v=O4iBnyq9U9Y
- Machine just rebooted
- Visual Studio debugging session with the code found here
- Debugging session runs with no problems
- Debugging session starts again. Green message, everything is fine
- Debugging session is now interrupted
- Intercept layer stops working
Hello, thank you again for your excellent description!
I think I know what's happening, and in short you found a very reliable way to get into the "stuck" state I mentioned above. Basically, when Visual Studio ends the debugging session it doesn't call into DLLs to detach. As a result, the intercept layer shutdown routine doesn't get called, and MDAPI is not properly closed. So, for the next execution, MDAPI is already enabled, and the call to OpenIoStream fails.
I'm in contact with the MDAPI folks to see if there's a reliable way to reset MDAPI if it gets into this state. Stay tuned...
Excellent. Glad to help
Any news on this issue?
I've filed a public issue against the metrics library to track this issue:
Thanks for the help
See comment on the issue filed above: intel/metrics-discovery#18 (comment)
This issue should no longer occur due to improvements within MDAPI. If you still encounter issues please let me know and I will try some experiments calling IAdapter_1_6::Reset()
to get out of the "stuck" state - thanks!