FosterFramework/Foster

Refactor rendering to use SDL3 GPU

NoelFB opened this issue · 15 comments

NoelFB commented

This is more of just a long term intention of replacing the current custom rendering API implementations with SDL3's GPU api, once it is ready.

This will likely require breaking changes to the C# Shader API as I believe the plan is for SDL3 to have a custom shader language.

I'm likely going to try implementing https://github.com/thatcosmonaut/SDL/tree/gpu in a branch. It's a SDL3 GPU implantation proposal that I think has a good chance of going through, and I've been following its development for quite a while. I think I'll likely keep OpenGL support as well to keep potential webgl and older platforms functional, but long term if it works out I might move to it entirely.

At which point, the entire current C layer's usefulness is really up for debate. There's still a fair amount of stuff (image loading, font loading) that really requires a native library but SDL3 (with this GPU implementation) really covers everything else required...

Sounds great. I've been following development as well and I'm interested to see how complex using the new API is in practice compared to using particularly verbose backends directly (looking at you Vulkan). From what I can see it's not horrible, but still certainly a big shift in thinking from OpenGL. Also curious as to if there's any performance gains to be had.

still certainly a big shift in thinking from OpenGL

Yeah, Foster in general is set up to abstract a lot of that away because it's intended for simple 2D games, but the tradeoff is it doesn't support a lot of the more complex options that a modern 3D rendering API has. I could potentially see a C# library that is more 1-1 with everything SDL provides being pretty cool... I'm not sure if that makes sense for Foster or not, though.

Keeping OpenGL is a bit tricky due to the differences in how Uniforms work. Newer APIs tend to just use "Uniform blocks" where you just pass blobs of data for vertex/fragment uniforms and it's up to you to make sure the data matches what the shader expects. Where as OpenGL (or WebGL) combines the shader programs and then has a big list of named Uniforms you're expected to assign individually. I do like the new system a lot more but I'm not sure how to handle this gracefully without having to awkwardly have the user build a table of all their uniforms if they use OpenGL (which I believe is what sokol gfx does)

I implemented a wrapper abstraction over sokol_gfx called a CF_Material. The material design holds a lookup table of string-key'd uniforms and composites them, at binding time (to a shader for a draw call), into a block of memory. This block of uniform data (in typical CPU memory still) gets handed to sokol_gfx which then funnels into the graphics backend in whatever way it needs to.

Now on the user side the material just acts as catch-bucket of cf_material_set_uniform_fs or cf_material_set_uniform_vs, where you set floats/textures onto the material by name. The name matches with the name in the shader for the uniform, as you'd expect. The matching happens at bind-time (like mentioned above), and any uniforms in material that don't exist in the shader are simply ignored. Any uniforms in the shader that aren't in the material get logged for warning and cleared to zero (the CPU uniform block is cleared to zero initially, so these zeroes get sent to the uniforms in the GPU even when missing in the material).

I think this has been an extremely easy to use design and wanted to share the design here. I figured Noel might like the design, or at least it could jog some ideas on how you want Foster to shape up in the future regarding shaders and uniforms.

Here's a brief sketch of my current material API for design reference:

CF_INLINE Material make_material() { return cf_make_material(); }
CF_INLINE void destroy_material(Material material) { cf_destroy_material(material); }
CF_INLINE void material_set_render_state(Material material, RenderState render_state) { cf_material_set_render_state(material, render_state); }
CF_INLINE void material_set_texture_vs(Material material, const char* name, Texture texture) { cf_material_set_texture_vs(material, name, texture); }
CF_INLINE void material_set_texture_fs(Material material, const char* name, Texture texture) { cf_material_set_texture_fs(material, name, texture); }
CF_INLINE void material_clear_textures(Material material) { cf_material_clear_textures(material); }
CF_INLINE void material_set_uniform_vs(Material material, const char* block_name, const char* name, void* data, UniformType type, int array_length) { cf_material_set_uniform_vs(material, block_name, name, data, type, array_length); }
CF_INLINE void material_set_uniform_fs(Material material, const char* block_name, const char* name, void* data, UniformType type, int array_length) { cf_material_set_uniform_fs(material, block_name, name, data, type, array_length); }
CF_INLINE void material_clear_uniforms(Material material) { cf_material_clear_uniforms(material); }

Happy to discuss ideas more here or elsewhere if that's helpful!

Thanks for sharing some thoughts here, this is definitely a cool direction. With SDL3's GPU, it doesn't actually expose any of the uniform names in the API as far as I know (and instead just uses blocks, and expects your data to match the block). I could maybe have some kind of tool to output the shader uniforms beforehand, or alternatively require the user to provide those. (ex. when you create a shader you also need to provide the uniforms that exist on it).

Alternatively I could just make it more 1-1 with SDL_GPU and just require the user to submit blocks of data, but that's not quite as nice as using uniform names in my opinion...

I’m also trying to implement SDL_Gpu. I wanted to be able to pass in glsl and have it cross compile to all backends, so I’ve tried statically linking in some other spirv libraries. This is pretty annoying, but they are all open source and free. This is also temporary until icculus provides a better SDL_shader_tools. I have the online shader compilation stuff (glsl -> SPIRV) behind an optional cmake feature flag, since it does bloat exe size, takes more build time, and requires python3 for cmake to configure properly).

So far I’ve got runtime shader compilation from glsl to SPIR-V, which can be saved to disk as blobs and shipped. Then use a tool made by the SDL_Gpu team to cross-compile the bytecode to whatever backend SDL_Gpu is using — just like in the SDL_Gpu examples.

Since I’ve got the glsl -> SPIR-V blob working it was easy to also using SPIRV-Reflect (header only library) to pull out reflection info from the shaders. I’m pulling out those four count variables that SDL_GpuCompileFromSPIRV needs, but also vertex input layout, names, types, and more importantly uniform block names and names of all the members, types, offsets in the uniform block. The reflection is filling in these little structs internally:

struct CF_UniformBlockMember
{
	const char* name;
	CF_UniformType type;
	int array_element_count;
	int size; // In bytes. If an array, it's the size in bytes of the whole array.
	int offset;
};

On the CPU side it’s not too much trouble to then map user uniforms to a little block of memory with matching layout to the shader. Then one call to SDL_GpuPushVertexUniformData can be issued.

This way users can just write some glsl, send in uniforms by name keys, and be done with it. Since the shaders are compiled online it’s easy to add in hotloading support as well.

I imagine you’d probably want something like for Foster as well, to simplify uniform workflow for users and make authoring the shader have less manual steps.

Edit: Example reflect code libsdl-org/SDL#9312 (comment)

There's still a fair amount of stuff (image loading, font loading) that really requires a native library

I know SDL_image and SDL_ttf exist, those might be of use.

I got the very early basics of this working in a branch in Foster... Currently shaders/materials are very hacked in (hard-coded, no reflection, only spir-v, etc) but it seems promising! It wasn't very hard to get the basics working.

image

I've gotten this to a fairly stable place in the sdl_gpu branch. The main things now are shader reflection (automatically figuring out uniforms) and shader cross compiling (which may already work but requires additional shared libraries, and I want to fully understand what it needs). The actual SDL_GPU implementation all works, albeit somewhat naively - I'm sure it could be improved a lot.

image

Been working on getting shared transfer buffers and separate upload/render buffers working:
https://github.com/MrBrixican/Foster/tree/upload_render_command_buffers

It's now functional, and shows pretty substantial performance increases in stress tests.

There's a few things that I think still need work:

  • Swapchain lifecycle
    • Since I based this off FNA3D which uses a fake framebuffer to support depth by default the ordering of swapchain lifecycle calls may not be correct, needs review
  • Make SDL_CreateGPUDevice debug_mode configurable/off in release
    • In my testing this can have a noticable impact on performance
  • Buffers should perhaps be sized exactly on their first resize/creation and then use next power of two
  • Review state lifecycle/resets/bindings
  • Buffer resizes lose contents, which may not be ideal
  • Stencil/depth operations need review to make sure we support most usage scenarios
  • Mailbox/Immediate mode rendering is potentially handled poorly, since swapchain sometimes gets set to null which causes render operations to get fail silently and copy operations to be wastefully executed
  • Previously some resources were tracked and disposed of automatically, that is no longer the case (especially concerning in cases like SpriteFont where textures can be leaked)

This looks really good! I definitely want to merge this into my branch when ready, and continue on from there.

Since I based this off FNA3D which uses a fake framebuffer to support depth by default the ordering of swapchain lifecycle calls may not be correct, needs review

Yeah that makes sense... I don't want to use a faux framebuffer on our end - the swapchain should just be used to present the final result, not for anything else. FNA has to support that due to how XNA was setup.

Make SDL_CreateGPUDevice debug_mode configurable/off in release

In general I want to figure out a better way to do this. When NuGet packages get uploaded they are set to Release mode, which means even if the end-user is running in Debug the Framework will still run under Release mode, and thus they won't get debug information/validation. I used to have a bunch of Debug.Assert throughout, until I realized they wouldn't do anything for the NuGet package. There must be some way to improve this?

Buffer resizes lose contents, which may not be ideal

Yeah I think the Mesh API will just need to make this clear, or add a flag to preserve it (in which case we'd need to copy the data and re-upload it). I would kind of be in favor of 2 methods (one to just upload/set buffer size, and one to do partial uploads that must fit into the existing size).

Mailbox/Immediate mode rendering is potentially handled poorly

I need to learn more about this as I don't totally understand how best to solve this.

Previously some resources were tracked and disposed of automatically, that is no longer the case (especially concerning in cases like SpriteFont where textures can be leaked)

Can you clarify how this breaks right now? I changed the way disposing works as I had to be more careful with OpenGL but I'm not sure what could go wrong now.

Previously some resources were tracked and disposed of automatically, that is no longer the case (especially concerning in cases like SpriteFont where textures can be leaked)

My mistake, I misinterpreted some of the changes. There is no risk of leaking persay, but I have a few concerns:

  • I'm not sure of thread safety of graphics resource destroy operations (I assume it's fine, just haven't looked into it).
  • Graphics resources may not be destroyed in proper order or not at all on application exit
    • In all SDL3_GPU examples, graphics resources are destroyed before the device, there is no guarantee of that with finalizers
    • Finalizers have no guarantee to run on program termination in .NET
    • Some classes, like SpriteFont, reference graphics resources yet have no dispose method or finalizer themselves, which leaves a user no option but to null out references and call GC.Collect()

With these concerns in mind, I think it might still be worthwhile to still track graphics resources and destroy/dispose of them on shutdown.

Yeah, those are good points! I'll fix that up soon, I think it should be fairly easy to track and release them. Even if not required (but it very well might be?) it's good practice.

NoelFB commented

General update on this ... it's working quite well now, however I decided to add back an OpenGL renderer for compatibility. Long term everything should migrate to just SDL_GPU but given the feature set of Foster it feels like it makes sense to keep OpenGL for the time being as well.