Pipeline cache serialization/deserialization investigation
expenses opened this issue · 5 comments
Goal
It would be neat and useful to have an implementation of get_pipeline_cache_data
on all modern platforms (Vulkan, DX12, Metal). Along with the corresponding code in create_pipeline_cache
, this would allow for being able to cache the pipelines to disk on all backends, giving a good performance boost when a lot of pipelines are used.
Status
Vulkan
The Vulkan API has this get_pipeline_cache_data
function built in.
Metal
Edit: disregard this whole section, see #3716 (comment).
The Metal backend has a pipeline cache:
gfx/src/backend/metal/src/native.rs
Lines 207 to 211 in 2a93d52
However there is no way to serialize or deserialize it at present.
The key blocker for this is that the ModuleInfo
struct stores a metal::Library
:
gfx/src/backend/metal/src/native.rs
Lines 200 to 205 in 2a93d52
While there is no way in the Metal API to serialize a MTLLibrary
(the underlying type), there is a serialize
function for MTLDynamicLibrary
which I believe we could convert into. It serializes directly into a file though, which is pretty gross. Presumably we'd then have to read back from this file.
The other option would be to just store the metal source code for the shader that has been converted from spir-v. This would not give as big a performance improvement though.
MoltenVK
MoltenVK implements a pipeline cache with MVKPipelineCache
. Similar to what we do with metal, this stores MVKShaderLibraryCache
s which in turn store MVKShaderLibrary
s. When implementing getPipelineCacheData
, it writes the metal source code, similar to what I suggest as an option above.
As an example of this, here's some of the output of a pipeline cache I generated:
&Y'v�˺GC�^\��@ mainzzzzzmain0>#include <metal_stdlib>
#include <simd/simd.h>
using namespace metal;
struct main0_out
{
float4 uFragColor [[color(0)]];
};
struct main0_in
{
float4 o_color [[user(locn0)]];
};
fragment main0_out main0(main0_in in [[stage_in]])
{
main0_out out = {};
out.uFragColor = in.o_color;
return out;
}
TxC@�@mainzzzzzmain0�#include <metal_stdlib>
#include <simd/simd.h>
<...>
DX12
The DirectX 12 backend doesn't have a pipeline cache. However, there is an issue that lays out how one could be created: #2877, similar to what the Metal backend does.
Thank you for filing this!
About the Metal backend, the pipeline caching path is the old stuff we use with SPIRV-Cross.
I tried to adjust it for Naga, but it wasn't easy. So for all the purposes, consider there not to be an implementation on Metal right now (since Naga is the future).
Okay, disregard basically everything that I wrote in the Metal section above, because on macOS 11.0 we can use the poorly-named poorly-documented MTLBinaryArchive
which does pretty much what we want. We still have to do some writing to a file then reading back because it takes urls as parameters instead of raw bytes, but that's acceptable enough.
Okay, disregard basically everything that I wrote in the Metal section above, because on macOS 11.0 we can use the poorly-named poorly-documented
MTLBinaryArchive
which does pretty much what we want. We still have to do some writing to a file then reading back because it takes urls as parameters instead of raw bytes, but that's acceptable enough.
I've made a start on this at this branch: master...expenses:metal-pipeline-cache
I actually did a small bit of research into the dx12 docs for #2877 last night since it seems to be free now, so I could keep looking into it and see if I get anywhere...I'm not too familiar with gfx-rs though and I've never done anything with dx12 so I wouldn't rely on me, but I will try 😄
Okay, I've been doing some testing of #3719 using a hacky fork of https://github.com/repi/shadertoy-browser. Basically it loads 8866 spir-v fragment shaders and creates a pipeline for each one using a basic vertex shader, then exits.
macOS has a system shader cache at $(getconf DARWIN_USER_CACHE_DIR)/com.apple.metal
, so that needs to be taken into account when timing this.
Here are some timings with and without caches:
wiped system cache, no pipeline cache: 683.82s, 659.88s
hot system cache, no pipeline cache: 24.39s, 27.21s
wiped system cache, hot pipeline cache: 442.45s, 451.47s
hot system cache, hot pipeline cache: 25.56s, 26.97s, 28.54s
So it looks like using Binary Archives as a pipeline cache does have an improvement over no cache, but not nearly to the degree that you'd expect! It could be that the Binary Archive isn't set up correctly, but I've tested this with MTLPipelineOptionFailOnBinaryArchiveMiss
and takes the same amount of time (450.93s
) and successfully compiles all 8866 pipelines.
I'm going to look into a second cache to store SPIR-V -> MSL transformations to see how much that improves things.