edubart/sokol_gp

Question about color uniform

Closed this issue · 2 comments

Hello,

I am inquiring about the reason why colour is applied as a transform and not as a vertex colour? If it is done through a vertex colour, I think it will decrease draw calls dramatically.

Thanks

Although the library supports color modulation, it was designed to draw the majority of the time shapes without that much color changes, like 2D games with a lot of sprites being drawn where typically you use white color for textured rectangles.

Also I've tested adding a color component to each vertex when it was created as an experiment (the code for that is in the branch https://github.com/edubart/sokol_gp/tree/colored_pips), in my benchmarks and researching adding color to each vertex decreased performance overall in the typical target usage that I had in mind, thus this was canceled.

The rationale is that adding color to each vertex puts extra overhead in the CPU<->BUS throughput, because all draw calls push more bytes to the GPU, even when batching, and this decreases the vertex throughput that you can dispatch to the GPU in a frame. While saving color in a uniform before a batch draw, you save lots of bytes that are pushed to the GPU when batching shapes with the same colors.

So in summary, to render fast, it's not only about minimizing the amount of draw calls, but also minimizing how many bytes you are pushing to the GPU every frame.

Concrete example

Let's give a concrete example, the current vertex is like this:

typedef struct sgp_vec2 {
    float x, y;
} sgp_vec2;
typedef struct _sgp_vertex {
    sgp_vec2 position;
    sgp_vec2 texcoord;
} _sgp_vertex;

That means each vertex uses 4 floats, and thus 16 bytes per vertex. If we add color component to it, you would have something like:

typedef struct _sgp_vertex {
    sgp_vec2 position;
    sgp_vec2 texcoord;
    float r, g, b, a;
} _sgp_vertex;

That doubles the amount of the vertex size, making the vertex 32 bytes! But then you could think of using a uint8 to use less bytes, but then it makes the vertex unaligned to 16 bytes, and the GPU is not happy with that! Also the uint8 to float conversion in the fragment shader is an overhead.

So to have vertex color we have to double the amount of bytes per vertex, making it 32 bytes. That means if we dispatch like 20000 vertices per frame, a typical scenario games can reach, it would push 20000*32 bytes per frame to the GPU. I was benchmarking typically measuring above 1000 FPS, that mean in at that rate in a second it would push 20000*32*1000 bytes = ~640 MB, that means about 640 * 8 = ~5.12 Gbps of throughput to the GPU every second! Some GPUs struggle at that throughput although they could process the simple shader. By not having colors in the vertex the throughput is cut in a half, making a notable impact in FPS in my tests. In my research I found out many times the bottleneck of what limits FPS is the GPU<->BUS, and how much data you are pushing, and this library also tries to optimize that.

oh, I did not think of the bandwidth. Thank you very much for the detailed explanation. It is really well-designed library. I enjoy using it much. One other great API is the *_at version of transformations, it makes hierarchical transformations the responsibility of the user in an easy way and kicks the "scenegraph" outside the library.