3b/cl-opengl

masking float traps around all FFI calls performance

mfiano opened this issue · 6 comments

A while ago, cl-opengl masked float traps around all FFI calls by default in commit 6faf5b0. The cause of this change was another bug I reported, due to shader compilation errors on a certain driver (Linux Intel Mesa).

I spent many hours profiling my game today, and was shocked at how expensive masking float traps around every call is. Therefor, I think this should be reversed (push a feature to mask them, not mask by default).

Here is an example SBCL sb-sprof result of running a game in the Pyx framework: https://i.lisp.cl/5K2Zac.png

For the long lavender RENDER-ENTITY block, all the LAMBDA's directly under it contain all the draw calls for different game entities each frame. The maroon-colored foreign... blocks at the leaves is a call to SBCL's arch_set_fp_modes as a result of masking float traps. As you can see, the multiple foreign calls happen frequently, and account for a good portion of the entire renderer's time (30% in this case).

Here is a zoomed in view of a single piece of geometry being renderered, using draw-arrays-instanced and bind-vertex-array: https://i.lisp.cl/JmiUBd.png - Here the cost of the masks can be clearly seen.

As another point in favor of reversing the feature logic here, I can no longer reproduce the shader compilation issue on Intel/Mesa.

To be honest, I am quite unsure why the float traps are masked for all FFI calls. Instead of inverting the feature, could we get rid of that and add a new feature for only masking float traps around shader compilation, also off by default?

phoe commented

Should this issue be linked on the SBCL bugtracker, too?

@mfiano can you show us the textual sb-sprof report? I have no idea what's going on in those screenshots.

The way things are right now is the correct behaviour that invites the least surprise. The performance impact is not heavy enough to make it unusable in dev.

For release one can simply compile with the trap switch disable and instead activate a global trap deactivation to ensure proper performance without stray errors.

3b commented

added an option to mask traps at coarser granularity without disabling it completely, so for example you can wrap your main loop or render function with %GL:WITH-FLOAT-TRAPS-MASKED and get most of the performance of disabling it, while still being more correct in foreign calls and without affecting CL code outside that dynamic extent.