adamrehn/ue4-docker

Proposal: split up large layers to work around registry layer size limits

adamrehn opened this issue · 2 comments

Problem overview

As of today, the ue4-minimal image has one large, monolithic layer that contains most of the files for the Installed Build of the Unreal Engine (see the relevant COPY directive for Linux and Windows). Optional components (DDC, debug symbols, and templates) are currently split out into their own layers to facilitate exclusion of the relevant COPY directives when users don't want to include these components, but all non-optional files are bundled into the single large layer.

Over time, this image layer has grown larger and larger, and it remains significantly bigger than the layers for the optional components. The snippet of output below from docker history ghcr.io/epicgames/unreal-engine:dev-5.4.1 illustrates the relative sizes:

CREATED BY                                                                           SIZE
COPY /home/ue4/UnrealEngine/Components/TemplatesAndSamples /home/ue4/UnrealEngine    4.31GB
COPY /home/ue4/UnrealEngine/Components/DebugSymbols /home/ue4/UnrealEngine           23.2GB
COPY /home/ue4/UnrealEngine/Components/DDC /home/ue4/UnrealEngine                    1.49GB
COPY /home/ue4/UnrealEngine/LocalBuilds/Engine/Linux /home/ue4/UnrealEngine          38.3GB

This poses challenges when working with container registries that limit the maximum size of an individual filesystem layer, such as GitHub Container Registry at (a purported) 10GB, Amazon ECR at ~50GiB, and to a lesser extent Azure Container Registry at 200GiB. These limits typically apply to the compressed size of a layer rather than its raw size, since only the compressed data is ever stored on disk by the registry. This distinction helps quite a bit, since a lot of the Unreal Engine files compress extremely well. Shown below are the compressed sizes of the layers whose raw sizes were listed above:

Templates and Samples:    2.58GiB
Debug Symbols:            5.12GiB
DDC:                      1.45GiB 
Monolithic Layer:         12.05GiB

Unfortunately, the monolithic layer has evidently grown large enough in Unreal Engine 5.4.2 that it can no longer be pushed to GitHub Container Registry, blocking the upload of the official development container images hosted under the EpicGames organisation. I'm not sure what the exact layer limit for GHCR actually is, given that the monolithic layers from both 5.4.1 and 5.4.2 have compressed sizes that get rounded to 12.05GiB when displayed (greater than both the 10GB limit stated in the GHCR docs and also the 10GiB value that text could be interpreted as if you assume the unit notation is incorrect), and the 5.4.1 image pushed successfully whereas the 5.4.2 layer is rejected. Presumably the real limit is somewhere in between the compressed sizes of the two.

Proposed solution

Given that we already have code to split files out into separate layers, the simplest solution is to leverage this logic to split non-optional files out into dedicated layers in the same manner as optional components, eliminating the bottleneck of a single monolithic layer. For the sake of simplicity, I propose performing this split based on the direct child directories of the Unreal Engine's top-level Engine directory, with each selected subdirectory stored in a dedicated layer. The selection of these directories is important, since they must meet two criteria:

  1. Each subdirectory must exist in all supported versions of the Unreal Engine (currently 4.27 and newer, based on our policy of supporting the 6 most recent releases) and should be reasonably expected to continue existing in future versions.

  2. Each subdirectory should be large enough that splitting it out is actually worth the overheads of adding a new layer. What constitutes "large enough" is subjective, but I'd argue that subdirectories smaller than 1MB are almost certainly not worth it, and subdirectories smaller than 1GB are probably not worth it unless they grow larger in future releases of the Unreal Engine.

I've analysed the list of Engine subdirectories and their sizes for Unreal Engine 4.27 through to Unreal Engine 5.4, and observed the following:

  • The Platforms subdirectory is present in some releases but not others, and therefore not a viable candidate.

  • The other 12 subdirectories are present in all of the examined releases. (Note that the DerivedDataCache subdirectory is already split out as part of the optional DDC component, so that leaves 11 subdirectories to consider.)

  • The following subdirectories are typically less than 1MB in size, and therefore almost certainly not worth splitting out: Build, Config, Programs.

  • The following subdirectories are typically less than 1GB in size, and therefore probably not worth splitting out: Documentation, Shaders.

  • The following subdirectories are typically greater than 1GB in size, and therefore the best candidates for splitting out: Binaries, Content, Extras, Intermediate, Plugins, Source.

Discussion points

My intention is to experiment with splitting out the >1GB subdirectories listed above, since they're obvious candidates, but do you think we should bother with the Documentation (typically in the order of a few hundred megabytes) and Shaders (typically in the order of 10 to 20 megabytes) subdirectories @slonopotamus and @TBBle?

Okay, my Linux and Windows tests have succeeded with recent versions of the Unreal Engine. I still need to test that everything works correctly for older versions like 4.27, but in the meantime I'll tag a release and see whether this allows the official 5.4.2 Linux images to push successfully to GHCR.

After fixing an unrelated Windows breakage in commit 4e53548, the 4.27 tests are now succeeding for both Linux and Windows. Given that this new layer splitting logic works for both older and newer versions of UE (and did indeed allow me to successfully push the 5.4.2 Linux images to GHCR), I'm satisfied that it's ready for broad use.