Handling assets with path references in the build process

Question

Handling assets with path references in the build process

Closed this issue 4 years ago · 5 comments

A big use-case for the asset pipeline will be to compile shaders into a platform-specific format. These shaders are usually written with #include statements to pull in other shader files when compiling. It would be good if we can support this.

I think it would be possible with the following:

AssetMetadata's build_deps that Importers return are AssetId instead of AssetUUID
After a full batch of imports is completed, try to resolve build_deps with AssetId::FilePath to AssetUUID. If it fails, raise an error.

This would only impact the import process, so the impact on the whole system will be minimized.

The reason I want build_deps to be correct is that it will be essential for implementing distributed builds in the future. If we know all the inputs required for a build (i.e. our dependency graph is correct), we can send this data to another machine and perform the build there, spreading the significant CPU load of the build process.

Answer 1 · 2018-12-22T21:19:59.000Z

I don't fully get the idea how paths help in that process. What if instead for certain platform-dependant assets, the importer would just generate multiple asset UUIDs, a single one for every different artifact (usually one per platform)? We could mark those platforms with search tags and generate the UUID aliases in a platform-dependant way.

Answer 2 · 2018-12-22T21:26:27.000Z

The assets themselves are not platform-dependant, but the build artifact will be. This is to enable use-cases like compressing textures/animations/meshes differently for mobile platforms, or considering a different shader format when running on a platform that perhaps only supports GLSL rather than SPIR-V.

So the AssetUUID references the asset itself, then there will be a processing pipeline that can optionally optimize or otherwise make the asset appropriate for loading on the target platform, finally producing a "build artifact" that is the actual [u8] to be loaded by the engine.

Check these diagrams:
https://github.com/kabergstrom/atelier-assets/blob/master/docs/graphics/import.svg
https://github.com/kabergstrom/atelier-assets/blob/master/docs/graphics/build.svg
https://github.com/kabergstrom/atelier-assets/blob/master/docs/graphics/processing_example_meshopt.svg

The identifier of a build artifact is the hash of all the inputs to the build function, so it can be cached properly.

Answer 3 · 2018-12-22T21:39:52.000Z

Currently the idea was that single asset uuid means single artifact. If I understand correctly, You want to add another layer of indirection, where an asset is a grouping of artifacts, and one is selected based on who's asking. We should probably add a concept of ArtifactUUID (which will be of exactly the same type, because why not) and a way to resolve those constraints by specifying some "query parameters" (or rather a single config on the client side). I woudn't touch the way we identify the assets alone though.

Answer 4 · 2018-12-22T22:07:05.000Z

The idea is that a single asset UUID references a user-visible piece of data, like a mesh, texture, material, shader or similar. User-editable metadata (.meta file) is then attached to this in the form of import settings, build settings, search tags etc. These two pieces of data are managed by the user and can be referenced by other things.

In pseudocode, getting a loadable [u8] blob works like this:

fn build_asset(asset_id, platform) -> [u8] {
    let asset, metadata = get_import_artifact(asset_id); // this part is what we have implemented so far with ImportedAsset
    let platform_options = metadata.get_platform_options(platform);
    let processed_asset = processing_pipeline(asset, metadata.pipeline, platform_options); // the processing_pipeline optimizes/converts/combines the intermediate representation of the asset
    let platform_artifact = build_for_platform(processed_asset, platform, platform_options) // this can be a noop if the intermediate and runtime representation is the same
    return platform_artifact
}

get_import_artifact here would need to get the latest imported asset with the specified AssetID and thus depends on filesystem state, but past that point the function is completely pure in the functional sense.

If you are familiar with how compilers work, there are a lot of parallels.

Compilers parse source data and produce symbols + attached code: we parse source files and produce assets + metadata
Compilers optimize code, inline functions etc: we optimize asset data, combine multiple textures into one etc in the "processing_pipeline".
Compilers generate "machine code" for the target platform based on the build configuration: we generate a "runtime format" for the asset based on the platform and build settings

A compiler output should be deterministic based on the input source files, just like our asset pipeline should be.

Inside of this build process, we don't generate any externally-visible identifiers that the user will ever use, but we still want to have an identifier for "the result of building an asset A" so that we can cache it. This identifier can thus be the hash of all inputs, more specifically the hash of this tuple: (asset_data_hash, metadata_hash, importer_version, builder_version, target_platform).

Does that make sense?

Answer 5 · 2021-01-10T17:13:00.000Z

We'll open a new issue once the build pipeline is in place