XGL cache creator tool
s-perron opened this issue · 4 comments
Background
There has been a lot of work in LLPC over the past 9 months to implement relocatable shaders. These were intended to provide a way to compile shaders “offline”, that is without running a vulkan application. However for that to be useful, there must be a way for a vulkan application to make use of the precompiled shaders.
To this end, we want to write a tool that will take the precompiled shaders and build a file whose contents can be passed as the initial data to vkCreatePipelineCache
. This tool should live in the XGL repository because XGL controls the format of the cache when it is serialized by vkGetPipelineCacheData
.
This will be a standalone tool that will have its own subdirectory in the tools directory.
Implementation details
Prerequisites
- Game developers must be able to use
amdllpc
to compile shaders and get an elf file. - The elf file will contain the cache hash for the shader/pipeline in the PAL metadata.
- The PAL metadata already contains the internal pipeline hash, but that seems to be compacted to 64-bits before it is assigned. Could this be expanded to 128-bits?
- This entry is only added during pipeline finalization, so the PAL metadata for a relocatable shader does not currently contain it.
- If we decide we only want this to work for relocatable shaders, then we could add something specific to relocatable shaders, but I would like something more general.
XGL cache creator
Command line interface
xlg_cache_creator [options] <input elf files>
Options:
-o <filename> - The filename to output the cache data to.
Required.
-device_id=<device id> - The device id of the device this cache will be used on.
The device id can be found at
https://devicehunt.com/view/type/pci/vendor/1002.
If this option is not provided, the device id will be
queried from the runtime.
-uuid=<uuid> - The uuid for the specific driver and machine.
<How can the uuid be found?>
If this option is not provided, the device id will be
queried from the runtime.
Algorithm
- Open the output file and set the position past the header size
- Initialize the key platform using the uuid.
- For each input elf file
- Open the file, and copy the contents to the output buffer.
- Add the contents of the file to the hash context
- I would like to avoid having everything in memory at the same time.
- Output the PipelineBinaryCachePrivateHeader using the standard malloc and free as the allocators.
- Output the header generated by
vkGetPipelineCacheData
Task list
- Modify the internal pipeline hash to be a 128-bit value for the cache hash. (Cannot do)
- Modify LLPC to emit a new elf section
llpc_cache_hash
contianing the 128-bit hash for the ELF file being generated. - Modify the unlinked shader path in LLPC to add the internal pipeline hash to the metadata.
- Refactor
PhysicalDevice::InitializePlatformKey
so:- The UUID is passed in as a parameter and used in place of the device properties.
- Allocation functions are passed as parameters so the vk instance is not needed.
- The time stamp is not used since the UUID is already the result of hashing the time stamp.
- The platform key that is created is returned.
- Make it available to the cache creator tool without needing a
PhysicalDevice
.
- Refactor
CalculateHashId
:- Replace the
pInstance
parameter with the allocation and deallocation functions, so that anInstance
is not needed. - Make it available to the cache creator tool.
- Replace the
- Refactor
vkGetPipelineCacheData
code that writes the header into a function (WriteVkCacheHeader
?) the cache creator tool can call. - Write the cache creator tool:
- Uses the new
InitializePlatformKey
,CalculateHashId
, andWriteVkCacheHeader
. - Uses the
ElfReader
to read the elf and extract the PAL metadata. - Uses
MsgPackReader
to read the PAL metadata to get the hash.
- Uses the new
Modify the internal pipeline hash to be a 128-bit value for the cache hash.
This might not be so easy. PAL makes use of the internal pipeline cache hash, and they expect it to be 64-bits. Changing that would be a big change.
Can you modify amdllpc to output the 128 bit hash in addition to the elf? Then the input to xgl_cache_creator would be a set of 128 bit hash and elf pairs.
Not really tied to this proposal but it is a bit concerning that we have two different mechanisms for calculating a hash that is stored in the same cache. I'm not sure if this is going to cause issues.
I think the refactoring you mention should be fine. Are you planning to keep the code in the same files they are now or put them in a separate file to minimize what needs to be compiled into xgl_cache_creator?
Can you modify amdllpc to output the 128 bit hash in addition to the elf? Then the input to xgl_cache_creator would be a set of 128 bit hash and elf pairs.
If you are thinking of amdllpc output two separate files then I would not like that idea. I want to the hash to somehow be included in the elf file so that less book keeping needs to be down both other tools. However, I am will to do that if that is what you want.
My preference would be to do something like add a new section to the elf that contains the hash, or a new pal metadata entry. The elf section would partially do what you want, "minimize what needs to be compiled into xlg_cache_creator", because the tool won't need to read the pal metadata.
Are you planning to keep the code in the same files they are now or put them in a separate file to minimize what needs to be compiled into xgl_cache_creator?
I would like a separate file, but I'm not too concerned about that.
You can have a section of the elf that PAL ignores. GetGenericSection() and SetGenericSection() are the functions to do that in PAL.