amethyst/rendy

Make rendering graph truly dynamic

Frizi opened this issue · 6 comments

Frizi commented

Current design of rendering graph assumes that graph is rarely, if ever, rebuilt. The scheduling process is really designed as one-off thing. The lack of dynamism manifests by things like resizing the window require a full rebuild. Creating resources like depth map images on demand is also very hard. The only way to accomplish it without rebuilds is through sharing via Aux type and manual synchronization, but this basically circumvents any usefulnes of graph in the first place. On top of that we don't really support subpasses and have severe problems with oversynchronization. There are also some challanges with the implementation, like rendy-chain being quite far out from what graph is doing, or rendering directly to surface image requiring a complex separate code path in every render node that wants to support it.

To solve those problems, the internal graph scheduler and render node API must be reworked to assume that things are dynamic. Additional goal is to be able to serialize the graph setup and hot-reload it on the fly.

Proposed high level design

The rendering graph lifecycle would effectively be split into three phases:

  • building: we take a node builder and make a real node struct out of it.
  • construction: The render nodes declare the resources that are going to be used and declares a "rendering execution" closure it wants to run later
  • execution: The registered rendering executions are evaluated in parallel, with the gpu-side synchronization taken care of by the graph

Graph building

A big difference between existing and proposed design is that rendering nodes are themselves declaring the resources that graph should create for them. The nodes can also produce and accept parameters which can contain arbitrary data types, including just a resource id. All dependencies between nodes are automatically infered based on read and written data.

A simple example of this would be a PresentNode declaring the output image resource, which then potentially multiple rendering nodes would be able to accept as their render target.

let mut builder = GraphBuilder::new();

// Add present node that provides a "color" image for others to render to
// The type is only for demonstration. Compiler is fully able to infer it.
let color: Parameter<ImageId> = builder.add(Present::new());

// CreateDepth node creates a suitable depth image that matches color image in size.
// It doesn't contribute to the actual rendering job in any way, but that's allowed.
let depth = builder.add(CreateDepth::new(color, gfx_hal::format::Format::R32Sfloat));

// Perform some compute job, returning a buffer resource that other nodes can use
let some_buffer = builder.add(ComputeReticulatingSplines::new());

// render shadow data into separate set of images/buffers, using some data
// from previous node. The data type is arbitrary, anything will work
// as long as other nodes can accept it as input.
let shadow_maps: Parameter<ShadowsData> = builder.add(RenderShadows::new(some_buffer));

// render something to the color image and use the provided shadows and depth buffer
builder.add(ForwardRender::new(color, depth, shadow_maps));

// create a resized copy of color image and use it to perform some post effects
let resized_color = builder.add(Resize::new(color, 0.5));
builder.add(BloomPostEffect::new(color, resized_color);

The code above runs basically only once, or extremally infrequently. The builder can be swapped out, but there is rarely need for this. Most of the dynamism can be accomplished by logic of render nodes at construction phase.

Graph construction and evaluation

Every node declares the resource it will use during the task it performs. The usage can be as simple as saying "i will use the image given to me as a color attachment on slot 0". More complex nodes can also declared arbitrary use_image or use_buffer, and later reference those resources. This happens every frame and has access to aux data.

// `self.color` can come from an argument passed into the node builder.
// It is of type `Parameter<ImageId>`.
let color = *ctx.get_parameter(self.color)?;
ctx.use_color(0, color)?;
let depth = *ctx.get_parameter(self.depth)?;
ctx.use_depth(depth, true)?; // true here means write access

let some_image = *ctx.get_parameter(self.some_image)?;
let image_usage = ctx.use_image(some_image, ImageUsage::Sampled(ShaderUsage::FRAGMENT));

Later the node returns the outputs it declared (in this case ()) and the actual rendering job it actually performs (the closure is what is later used in execution phase).

Ok(((), NodeExecution::pass(|ctx, aux| {
    let image_object = ctx.get_image(&image_usage);
    // perform the rendering job here,
    // including writing descriptor set or recording commands into command buffer
}))

The code here is essentially equivalent to existing render groups. There are also other NodeExecution types that are more suited for other cases like presenting, image transfers or compute nodes.

All resources used exclusively by this node are still managed by it. The node state itself is persistent (up until the GraphBuilder as a whole isn't replaced). If there is a descriptor set needed, the node is able to create it in build phase and use during evaluation phase.

The difference is that the resources taken from graph (like image_object in example above) cannot be assumed to be the same across frames. They can MOSTLY be the same between the same "frame_in_flight", but this can change at arbitrary time due to logic happening in other nodes. Rendy can provide a set of utility types/functions to make it easier to deal with in common cases.

Once all nodes have finished their construction phase and declared all resources, the graph can be optimized and scheduled. The optimization/reduction passes can take care of things like merging multiple declared NodeExecution::pass instances into single subpass, and then multiple subpasses into single render pass with the right subpass dependencies. This also allows for easy derivation of optimal LoadOp and StoreOp settings and putting barriers only where it's needed. After the internal single-frame graph is fully reduced, the executions are scheduled for parallel execution.

This project is a good reference. https://github.com/pumexx/pumex

I didn't read it thoroughly, but it looks more like old graph implementation (except windowing problems with it). Could be wrong though.

As far as I understand it, the actual scheduling of the "render group" closures into passes and subpasses would happen after the construction stage has happened. At this point, the graph should have all the information from the various use_* calls to figure out dependencies, set up attachments and all that stuff.

However, with the proposed interface, I don't think the scheduler actually has enough information to decide what can go into subpasses, and what needs to be bumped into a separate pass, right? The graph would also need a hint about whether only the same pixel is accessed in the buffer, because that is a requirement of a subpass.

I suppose just a separate parameter to the use_color and use_depth functions would do the trick?

I guess my above answer presumes the new dynamic graph would want to support multiple subpasses, which as I understand it are not supported currently. I don't really see any reason to omit subpass support when things are changed around anyway.

The proposed approach of using optimizations on the graph after construction for merging nodes into subpasses seems like a bit of a fragile approach to me.

That approach, at least as presented, makes the index of the different attachments an implementation detail of the node. This means that in order to assure different nodes are merged into the same subpass/different subpasses within a pass, you would have to consider that implementation detail of the nodes.

I have a couple of ideas:

  1. Make framebuffer/attachments a totally separate concept in the graph. Graphics nodes would then take a FramebufferId as a parameter. The framebuffer could be set up and managed by a regular node.
    Have a separate AttachmentId, which are obtained when building the Framebuffer node, and which need to be passed into the nodes which use that attachment as a parameter. This would require the pipeline to be compiled to reference the correct attachments, which if attachments are allocated in the Framebuffer node at build time, would be known by the node at the earliest at build time.
  2. Make attachments, framebuffers and all that jazz, managed by the node itself. This would lead to having a more complex node type that represents a full render pass. That node can then have multiple subpasses. This is less flexible, but at least it would be more flexible than what we have today on the node graph level.

Any thoughts? @omni-viral @Frizi

I am still getting used to the concepts in Vulkan, and it seems I had a bit of a misunderstanding. A lot of what I wrote above was based on not knowing that the indices used in the shaders actually reference bindings made when creating the subpass, and not when creating the pass as a whole.