GPUOpen-Drivers/xgl

XGL cache creator tool

s-perron opened this issue · 4 comments

Background

There has been a lot of work in LLPC over the past 9 months to implement relocatable shaders. These were intended to provide a way to compile shaders “offline”, that is without running a vulkan application. However for that to be useful, there must be a way for a vulkan application to make use of the precompiled shaders.

To this end, we want to write a tool that will take the precompiled shaders and build a file whose contents can be passed as the initial data to vkCreatePipelineCache. This tool should live in the XGL repository because XGL controls the format of the cache when it is serialized by vkGetPipelineCacheData.

This will be a standalone tool that will have its own subdirectory in the tools directory.

Implementation details

Prerequisites

  • Game developers must be able to use amdllpc to compile shaders and get an elf file.
  • The elf file will contain the cache hash for the shader/pipeline in the PAL metadata.
    • The PAL metadata already contains the internal pipeline hash, but that seems to be compacted to 64-bits before it is assigned. Could this be expanded to 128-bits?
    • This entry is only added during pipeline finalization, so the PAL metadata for a relocatable shader does not currently contain it.
    • If we decide we only want this to work for relocatable shaders, then we could add something specific to relocatable shaders, but I would like something more general.

XGL cache creator

Command line interface

xlg_cache_creator [options] <input elf files>

Options:

-o <filename>          - The filename to output the cache data to. 
                         Required.
-device_id=<device id> - The device id of the device this cache will be used on.
                         The device id can be found at
                         https://devicehunt.com/view/type/pci/vendor/1002.
                         If this option is not provided, the device id will be
                         queried from the runtime.
-uuid=<uuid>           - The uuid for the specific driver and machine.
                         <How can the uuid be found?>
                         If this option is not provided, the device id will be
                         queried from the runtime.

Algorithm

  • Open the output file and set the position past the header size
  • Initialize the key platform using the uuid.
  • For each input elf file
    • Open the file, and copy the contents to the output buffer.
    • Add the contents of the file to the hash context
      • I would like to avoid having everything in memory at the same time.
  • Output the PipelineBinaryCachePrivateHeader using the standard malloc and free as the allocators.
  • Output the header generated by vkGetPipelineCacheData

Task list

  • Modify the internal pipeline hash to be a 128-bit value for the cache hash. (Cannot do)
  • Modify LLPC to emit a new elf section llpc_cache_hash contianing the 128-bit hash for the ELF file being generated.
  • Modify the unlinked shader path in LLPC to add the internal pipeline hash to the metadata.
  • Refactor PhysicalDevice::InitializePlatformKey so:
    • The UUID is passed in as a parameter and used in place of the device properties.
    • Allocation functions are passed as parameters so the vk instance is not needed.
    • The time stamp is not used since the UUID is already the result of hashing the time stamp.
    • The platform key that is created is returned.
    • Make it available to the cache creator tool without needing a PhysicalDevice.
  • Refactor CalculateHashId:
    • Replace the pInstance parameter with the allocation and deallocation functions, so that an Instance is not needed.
    • Make it available to the cache creator tool.
  • Refactor vkGetPipelineCacheData code that writes the header into a function (WriteVkCacheHeader?) the cache creator tool can call.
  • Write the cache creator tool:
    • Uses the new InitializePlatformKey, CalculateHashId, and WriteVkCacheHeader.
    • Uses the ElfReader to read the elf and extract the PAL metadata.
    • Uses MsgPackReader to read the PAL metadata to get the hash.

Modify the internal pipeline hash to be a 128-bit value for the cache hash.

This might not be so easy. PAL makes use of the internal pipeline cache hash, and they expect it to be 64-bits. Changing that would be a big change.

Can you modify amdllpc to output the 128 bit hash in addition to the elf? Then the input to xgl_cache_creator would be a set of 128 bit hash and elf pairs.

Not really tied to this proposal but it is a bit concerning that we have two different mechanisms for calculating a hash that is stored in the same cache. I'm not sure if this is going to cause issues.

I think the refactoring you mention should be fine. Are you planning to keep the code in the same files they are now or put them in a separate file to minimize what needs to be compiled into xgl_cache_creator?

Can you modify amdllpc to output the 128 bit hash in addition to the elf? Then the input to xgl_cache_creator would be a set of 128 bit hash and elf pairs.

If you are thinking of amdllpc output two separate files then I would not like that idea. I want to the hash to somehow be included in the elf file so that less book keeping needs to be down both other tools. However, I am will to do that if that is what you want.

My preference would be to do something like add a new section to the elf that contains the hash, or a new pal metadata entry. The elf section would partially do what you want, "minimize what needs to be compiled into xlg_cache_creator", because the tool won't need to read the pal metadata.

Are you planning to keep the code in the same files they are now or put them in a separate file to minimize what needs to be compiled into xgl_cache_creator?

I would like a separate file, but I'm not too concerned about that.

You can have a section of the elf that PAL ignores. GetGenericSection() and SetGenericSection() are the functions to do that in PAL.