GPUOpen-Drivers/xgl

Making cache creator usable with the open-source AMDVLK

Closed this issue · 1 comments

kuhar commented

Objective

In addition to making cache creator usable for our internal partners/developers, we want to reach an agreement on how to make the open source XGL cache creator implementation usable. Open-source users/developers should be able to compile shaders offline and bundle them as a valid pipeline binary cache file on a machine without the target GPU, and then be able to load it on a machine with a compatible AMDVLK/AMDGPU installation.

We propose to change how hashes included in the pipeline cache blobs are calculated and a new tool, cache-info, to allow this cache creation flow.

Background

Our prototype cache-creator implementation is available on a public XGL fork. The intended offline relocatable shader cache creation flow looks as follows:

  1. Compile shaders into .spv files.
  2. Compile all .spv files offline with amdllpc into relocatable shader elfs:
    amdllpc -gfxip=9.0.0 -unlinked -enable-relocatable-shader-elf shader.spv -o ./elfs/shader.elf
    
  3. Bundle all elf files into a pipeline binary cache blob file with the cache-creator tools:
    tools/cache-creator --device-id=0x6860 --uuid=6d4570f9-78b4-ef2f-fb61-f46919af88b4 -o cache.bin ./elfs/*.elf
    
  4. Run the vulkan application and provide the pipeline cache file as a blob with vkCreatePipelineCache.

Before loading the cache contents, the ICD performs a series of checks that make sure that the cache blob: (1) is not corrupted, (2) matches the host driver/GPU. Both of these are performed based on the cache headers:

(1) vk::PipelineCacheHeaderData::headerLength and vk::PipelineBinaryCachePrivateHeader::hashId.
(2) vk::PipelineCacheHeaderData::{headerVersion,vendorID,deviceID,UUID} and vk::PipelineBinaryCachePrivateHeader::hashId.

With AMDVLK, headerVersion and vendorID are known constants, while deviceID is known on a per-GPU-model basis. UUID and hashId are calculated based on the runtime host system configuration and include bits derived from build timestamps. Some unnecessary runtime system information leaking into hashId makes it difficult to build caches offline on a non-target system, which affects both usability and testability of the cache-creator tool.

Pipeline binary cache format changes for offline relocatable cache creation

We want to be able to run the cache-creator tool on machines without the target GPU. This will also allow us to set up testing infrastructure on any VM without AMD GPU. As in the original cache-creator tool proposal, we want to be able to specify all the necessary hardware/system information through command line arguments.

  1. Device ID (--device-id) and pipeline cache UUID (--uuid) will be extracted from a valid pipeline cache file created on the target machine.
    a. We propose to add a new tool, cache-info, that given an existing pipeline cache file outputs its header information and (optionally) summary of the pipeline cache entries. You can find a prototype implementation here. A simple Vulkan application to create a pipeline cache on the target machine and save it to a file (prototype here), while the rest of the pipeline creation can run on any other machine. This can be further simplified, see the bottom of the section for details.
    b. We separately propose to drop some runtime properties from pipeline cache UUID computation, namely: CPU and memory system information, as they don't affect pipeline compatibility.

A sample cache-info tool invocation looks like this:

$ tools/cache-info cache.bin --elf-source-dir=./elfs
Read: cache.bin, 407588 B                                                                                                           
                                                                                               
=== Vulkan Pipeline Cache Header ===                                                                                                                                                          
header length:          32                                                                     
header version:         1                                                                                                                                                                     
vendor ID:              0x1002                                                                 
device ID:              0x6860                                                                                                                                                                
pipeline cache UUID:    e333659f-a1b8-bafe-f19b-04dc792f9e99                                   
                                                                                                                                                                                              
=== Pipeline Binary Cache Private Header ===                                                   
header length:  20                                                                                                                                                                            
hash ID:        f1a233ff dd873311 0004432 ffff0ddf 2a303455                                   
                                                                                                                                                                                              
=== Cache Blob Info ===                                                                        
content size:   407536                                                                                                                                                                        
                                                                                               
        *** Entry 0 ***                                                                                                                                                                       
        hash ID:        4198f242 044ad73b 02283345 4845466c                                    
        data size:      2144                                                                                                                                                                  
        MD5 sum:        940c7b25c14d997209f7726611c73c1c                                       
        source elf:     ./elfs/attachmentread.frag.elf                                                                                     
                                                                                               
        *** Entry 1 ***                                                                                                                                                                       
        hash ID:        67af7c86 06930f37 7bcf4d76 57426594                                    
        data size:      1552                                                                                                                                                                  
        MD5 sum:        1c4d73ed2b8ee5535e876940cd710fb6                                       
        source elf:     ./elfs/attachmentread.vert.elf  

Note that the second argument, --elf-source-dir, is optional and only used for testing/debugging.

  1. vk::PipelineBinaryCachePrivateHeader::hashId is calculated with a hash algorithm and initial data provided by Util::IPlatformKey which includes full VkPhysicalDeviceProperties struct (including pipeline cache UUID). We cannot read this data off a pipeline cache file, because it blends in both the system information and pipeline cache data. Some possible solutions are:
    a. Include the missing physical device properties (if any) in the UUID itself and use it as the initial data to calculate hashId.
    b. Make it possible to serialize IPlatformKey and save it in a new field in PipelineBinaryCachePrivateHeader. This way we can read it off existing pipeline binary cache files and pass as a new argument to the cache-creator tool.

The changes in 2. could be conditionally enabled only in the relocatable compilation mode if you don't want to change the behavior in regular online full-pipeline compilation. We would prefer the first option (2a) and would like to understand which physical properties and system info fields actually affect cache compatibility and use only those.

If 2a sounds like a good way forward, we could further simplify the cache creation flow by extending the vulkaninfo tool to print pipeline cache UUID, so that we can take IDs from its output instead of having to create a cache file with some Vulkan application and read those values with cache-info.

Testing

Cache creator unit tests in XGL

We propose to use a unit testing framework in XGL, e.g., Doctest or Google Test. You can find the prototype unit test implementation with Doctest here. The main difference between the two frameworks is that Doctest consists of a single header file, which makes it a very small dependency easy to build with any build system. Concretely, unit tests will be a new cmake executable target and will execute on any build machine, i.e., a GPU won't be required.

Unit tests will be only built when the rest of cache creator is enabled with -DXGL_ENABLE_CACHE_CREATOR=ON, but could further hide them behind an additional cmake option. We will run these unit tests in the public LLPC CI and could also make a similar CI setup for XGL based on GiHub Actions easily.

LIT tests

We further propose to add offline end-to-end tests based on the LIT testing infrastructure. LIT is used in LLPC to run the shaderdb tests. The testing flow will look like follows:

  1. Compile a few SPIR-V files with amdllpc in the relocatable compilation mode.
  2. Create a pipeline binary cache files based on the elf files with cache-creator with fake deviceID and UUID.
  3. Run cache-info to check if the cache header is fine and the cache contents match the elf files from 1.

Note that this doesn't require a GPU, so these tests can be placed either in XGL or LLPC's shaderdb.

End-to-end cache hit tests

We don't have a full proposal for full GPU tests that would check if we are getting expected cache hits. These would have to run under Jenkins, as public VMs don't come with AMD GPUs.

One important change that we propose is to add an option to disable the ArchiveLayer in PipelineBinaryCache, so that we don't get unintended cache hits based on system-level persistent cache files.

Summary

We propose to make cache-creator usable and testable by using a flow, where cache header IDs can be read from valid pipeline cache files using a new tool cache-info and then provided to the cache-creator tool. To make it possible, we need to modify how the cache header field hashId is handled.

We propose to introduce two levels of public tests: unit tests in XGL and LIT tests in XGL or LLPC, which won't require a GPU to run. To make it easier to test cache hits with Vulkan applications, we propose to add an option to disable the ArchiveLayer in PipelineBinaryCache.

The prototype implementation that allows the proposed offline flows and unit tests are available at https://github.com/kuhar/xgl/tree/cct/tools/cache_creator.

kuhar commented

Status update: the majority of this proposal has been implemented and merged. The only missing piece is being able to load cache files created offline -- this is pending on Pipeline Cache UUID refactoring changes (similar to #90).