jpcy/ioq3-renderer-bgfx

q3dm1 fps drop in red tongue mouth caused by patch normals

Closed this issue · 3 comments

Hi, my fps drops to 2-8fps inside the red tongue mouth in q3dm1. Using the Visual Studio 2017 profiler I quickly figured out whats causing it:

image

	vec3 Vertex::getNormal() const
	{
#if 1
		return vec3
		(
			bx::bitsToFloat(half_to_float(normal[0])),
			bx::bitsToFloat(half_to_float(normal[1])),
			bx::bitsToFloat(half_to_float(normal[2]))
		);
#else
		return vec3( 1, 0, 0 ); // this removes the fps drop to < 10 fps
#endif
	}

And the half_to_float function is called quite some often when inside the red mouth:

getnormal bgfx ioquake3

Probably it would make more sense to just convert the short normal once and just return that calculated normal?

Hi, thanks for fixing this so fast. I did another little performance test, because for some reason bgfx is more than 10x slower than opengl1 renderer (400-500fps vs 30-70fps).

One strange point is that more than 44% of a frame is spend in realloc:

bgfx_realloc

Then the rest is mostly uniform/submit stuff:

bgfx_uniforms

Would be epic if this could get to near-opengl1-performance, I would like to use bgfx as WebGL renderer, since currently I'm just using rend2, which I hacked up being GLES only (and e.g. skybox and framebuffer stuff doesn't work anymore)

jpcy commented

Setting r_fastPath to 1 should improve performance a little.

I haven't done any real profiling since on an old machine I get 350+ fps, compared to ~500 fps with the opengl and opengl2 renderers, which is only 0.85 ms difference. You're getting much worse performance...

You will have to use webgl2 because I'm using GL 3.2 features like GLSL texelFetch for dynamic lighting.

Thanks alot again, the last 3 commits with enabled r_fastPath upped the fps to around 40-120fps in Debug mode.

For comparison, Debug mode:
opengl1: 200-500fps
opengl2: 190-300fps
bgfx: 40-120fps

And I tested the code now in Release x32 mode, which is a complete game changer:

opengl1: 500-1000fps
opengl2: 200-400fps
bgfx: 600-1000fps

(hard to tell the exact fps over 1000fps, because the fps text is flickering so much)

I guess the lesson is that the bgfx usage of malloc/free and C++ code are just way slower than C code in Debug mode.

And nice to know about the GL 3.2 features, will see when I get to the point of porting this to WebGL2.