The Crash Diagnostic Layer (CDL) is a Vulkan layer to help track down and identify the cause of GPU hangs and crashes. It works by instrumenting command buffers with completion checkpoints. When an error is detected a dump file containing incomplete command buffers is written. Often the last complete or incomplete commands are responsible for the crash.
See the associated BUILD.md file for details on how to build the Crash Diagnostic Layer for various platforms.
CDL uses the following extensions. If an extension is not present, some functionality might be disabled.
VK_KHR_timeline_semaphore
(or Vulkan 1.2) - used to track forward progress of queue submissionsVK_NV_device_diagnostic_commands
- used to track forward progress within command buffersVK_AMD_buffer_marker
- used to track forward progress within command buffers (if the NV extension is not present) and the state of binary semaphores.VK_AMD_device_coherent_memory
- improves the accuracy of VK_AMD_buffer_marker trackingVK_EXT_device_fault
- allows drivers to report data related to aVK_ERROR_DEVICE_LOST
, such as invalid device addresses.VK_EXT_device_address_binding_report
- allows drivers to report lifecycle information about device address assignments, which can help identify the source of device faults.
CDL can be used as an explicit or implicit layer. The loader's documentation describes the difference between implicit and explicit layers, but the relevant bit here is that implicit layers are meant to be available to all applications on the system, even if the application doesn't explicitly enable the layer. On the other hand, explicit layers are easier to use when doing application development.
To use CDL as an explicit layer, enable it with vkconfig
or do the following:
- The directory containing the file
VkLayer_crash_diagnostic.json
is included in the layer search path, by including it in either theVK_LAYER_PATH
orVK_ADD_LAYER_PATH
environment variable or by usingvkconfig
. - Include the layer name in the VK_INSTANCE_LAYERS environment variable or the ppEnabledLayerNames field of VkInstanceCreateInfo.
When using CDL as an implicit layer, it is disabled by default. To enable the layer's functionality, set the CDL_ENABLE
environment variable to 1.
In order to be discovered by the Vulkan loader at runtime, implicit layers must be registered. The registration process is platform-specific and is discussed in detail in the Vulkan-Loader documentation. In all cases, it is the layer manifest (the .json file) that is registered; the manifest contains a relative path to the layer library, which can be in a separate directory.
Once the layer is enabled, if vkQueueSubmit()
or other Vulkan functions returns a fatal error code (usually VK_ERROR_DEVICE_LOST
), a dump file of the command buffers (and other state) that failed to execute are written to disk.
CDL outputs messages at runtime to indicate it is enabled, to trace certain commands, and to report when a gpu crash has occurred. The message_severity
, log_file
, trace_on
and trace_all_semaphores
configuration settings control which messages are output and where they are sent.
VK_EXT_debug_utils
and VK_EXT_debug_report
extensions can also be used to receive log messages directly in the application.
A unique log file directory is created every time an application is run with CDL enabled. The log file directories are named with the timestamp of the format YYYY-MM-DD-HHMMSS. This prevents subsequent runs from overwriting old data, in case the application is non-deterministic and producing different crashes on each run. The location of these files is platform-specific:
- Linux:
${HOME}/cdl/
- Windows:
%USERPROFILE%\cdl\
The output directory for dump files can be changed using the output_path
configuration setting, described below.
The name of the dump file is cdl_dump.yaml
, since the file is in the YAML format. Other files, such as dumped shaders, may be present in the dump directory.
This layer implements the VK_EXT_layer_settings
extension, so it can be configured with vkconfig
, programmatically, or with environment variables. See the manifest for this layer and the layer manifest schema for full details. The discussion below uses the key
field to identify each setting, the env
field defines the corresponding environment variable, and the label
field defines the text you will see when using vkconfig
.
watchdog_timeout_ms
can be set to enable a watchdog thread that monitors the time between queue submissions. If this timeout value (specified in milliseconds) is hit without a queue submission, the gpu is assumed to be crashed and a dump file is created.- Dump files
output_path
can be set to override the directory where log files and shader binaries are written. This can be a full path (starting with/
or a drive letter) or a path relative to the application current working directory.dump_queue_submissions
controls which queue submissions are dumped.running
causes only the submission currently executing to be dumped.pending
will also dump any submissions that have not started execution.dump_command_buffers
controls which command buffers are dumped.running
causes only the command buffer currently executing to be dumped.pending
will also dump any command buffers that have not started execution.all
will dump all known command buffers.dump_commands
controls which commands are dumped.running
causes only the commands currently executing to be dumped.pending
will also dump any commands that have not started execution.all
will dump all commands in the command buffer.dump_shaders
controls if shaders are included in the dump directory. Possible values for this setting are:off
- no output,on_crash
- only dump the shaders that are bound at the time of a gpu crash,on_bind
- dump shaders only when they are bound, andall
- dump all shaders as soon as they are created.
- Logging
message_severity
can be set to a comma-separated list of the types of messages CDL should output to the default logger. Application defined loggers should control which messages they want to recieve with the options available in theVK_EXT_debug_utils
orVK_EXT_debug_report
extensions.log_file
can be set to control where log messages are sent by the default logger. There are several special values.stderr
andstdout
send messages to the application console.none
disables the default logger. Any other value is assumed to be an absolute or relative path to the log file.trace_on
can be enabled to cause important commands, such asvkQueueSubmit
to be logged as they occur.trace_all_semaphores
enables logging messages about every vulkan command that uses semaphores.
- State tracking
sync_after_commands
adds a pipeline barrier after every instrumented vulkan command. This will reduce performance and may cause some hangs to go away. This option currently only works when usingVK_KHR_dynamic_rendering
instrument_all_commands
can be enabled to include completion markers around every vulkan command. This may allow more accuratute fault locations at the expense of larger command buffers and reduced performance.track_semaphores
enables detailed semaphore state reporting in runtime logging and dump files.VK_AMD_buffer_marker
is required for this feature.