GPUOpen-Tools/GPU-Reshape

Shader cache

miguel-petersen opened this issue · 1 comments

While the instrumentation may change quite often, the initial parsing rarely does. The prototype layer made use a global cache to speed up instrumentation times, we need something similar here.

Apologies, I completely forgot to write up this issue.

The basic idea is that instrumentation of shader data, for a given "instrumentation key" is deterministic, so, imagine if the user constantly instruments the entire application but only one shader has changed, a bit redundant.

Instrumentation keys, to be documented, are essentially how a program is to be instrumented, see:
Source\Backends\DX12\Layer\Include\Backends\DX12\States\ShaderInstrumentationKey.h
Source\Backends\Vulkan\Layer\Include\Backends\Vulkan\States\ShaderModuleInstrumentationKey.h

For Vulkan and DX12, it should be sufficient to use the featureBitSet and combinedHash as the lookup indices.

For the shader compilation, please see CompileShader in:
Source\Backends\DX12\Layer\Source\Compiler\ShaderCompiler.cpp
Source\Backends\Vulkan\Layer\Source\Compiler\ShaderCompiler.cpp

The basic idea is to do a lookup of three items, the featureBitSet, combinedHash and effective hash of the shader binary. For Vulkan, we might need to compute this (maybe), for DX12 see the ShaderStateKey inside the ShaderState.

If this lookup succeeds, that is, an item is present, we skip the entire function including the InitializeModule, and just load the binary. Something like:

if (auto entry = someDatabase.Find(job.instrumentationKey)) {
    // Assign the instrument
    job.state->AddInstrument(job.instrumentationKey, entry.shaderBlob);

    // Mark as passed
    ++job.diagnostic->passedJobs;
    return;
}

In case the key doesn't exist, populate it later. Then, implement some form of intermittent serialization to disk on an async thread or similar.

The shader cache must also have a basic versioning scheme, probably just globally, to allow for invalidation when the shader compiler has changed.

That's the basic idea, simple in theory, hopefully also in implementation.