Multicore CPU-s (a question or feature request).
thisnickwasfree opened this issue · 9 comments
Is there any plans of moving some parts of 3Dreamengine to other threads?
The possibility to send the most heavy parts to another core could improve performance, as I think. But I'm not a specialist in this question, so I want to hear other opinions.
Love2d itself has this possibility, but the question is what can be send to other threads safety.
3Dreamengine is nice atm, but better performance would be good. And one of the main problem, as I see, high CPU usage both 3Dreamengine and other code in love 2d, that can cause low fps if CPU's core they both use loaded close to 100%.
I threaded most resource loading and I'm working on making it more useful. But thats only relevant for memory intensive games.
The engine itself relies on the love.graphics module, which is not threadable. The CPU usage however is too high. I have to perform some profiling to determine, where I waste most power.
One important thing: keep draw calls as low as possible. It is no problem for the GPU to render 60k polygons at once, but >500 individual objects are killing the CPU.
I'm currently working on a mesh combiner (I require it for a block-based platformed, but it works for everything). I hope this can reduce CPU usage for most projects.
keep draw calls as low as possible
Does 3Dreamingine have any built-in counters in? I guess the quantity of render calls is much bigger than quantity of drawing objects: at least shadows can give additional render calls and not only them, right?
I have to perform some profiling to determine, where I waste most power.
Well, some months ago you've already improved performance dramatically, so I believe you ll be lucky in your researches: I'm extremely interested in success of this.
Writing here because that's again about performance.
As I understand, small objects, hidden by bigger and closer ones, will be rendered anyway?
I'm getting noticeable fps drop after spawning a lot of small objects behind the wall, where they can not be seen by camera. As I understand, shadows and reflections makes this necessary (well, additional calculations needed to find objects which will not affect shadows and reflections and exclude objects from rendering). But… Is it possible to make optimization, excluding this «invisible» objects?
I remember, you've said there were some problems with z-sorting in alpha-dithering issue, so, I'm not sure if the optimization I'm talking about is possible in 3DreamEngine, but anyway: can ask a question — asking a question.
Writing here because that's again about performance.
As I understand, small objects, hidden by bigger and closer ones, will be rendered anyway?
I'm getting noticeable fps drop after spawning a lot of small objects behind the wall, where they can not be seen by camera. As I understand, shadows and reflections makes this necessary (well, additional calculations needed to find objects which will not affect shadows and reflections and exclude objects from rendering). But… Is it possible to make optimization, excluding this «invisible» objects?
I remember, you've said there were some problems with z-sorting in alpha-dithering issue, so, I'm not sure if the optimization I'm talking about is possible in 3DreamEngine, but anyway: can ask a question — asking a question.
Sometimes. The GPU skips the Fragment shader if the object is not visible, but if it happens that the objects are ordered front to back, it cant optimize.
It can be optimized using sorting. In the opposite direction as required by alpha blend but since those are two separate passes no problem. However the CPU time spent sorting does not equal the GPU time saved. Or at least this is what I think, I will look into this.
The usual approach is this, or similar: Define manually which objects are visible from your current position. For example, you are in room 5, you can see rooms 1, 3 and 7 (and of course 5 itself).
Another approach are alpha planes or something like this.
I will research those possibilities and might implement a proper method.
Also, which is in progress, you will be able to set custom rules for shadows and reflections. That means, you can exclude those tiny objects from the shadows. I will come back to this thread once finished.
In addition, I further inspect instances and why they do not yield the promised speed yet. They might resolve this issue too.
My current performance optimizations make progress too. The Tavern demo now runs ~40% faster :)
Done. Depending on the scene of course there should be a major boost. The Tavern demo (on previous settings) runs 80% faster.
I have ideas for threading parts of the rendering process, but the results would be minor for a lot of work, so I keep this ticket open and see if its worth it later.
Because of a lot of changes you might need to update settings, but I "think" it should work fine. Please report errors I made. Since I rewrote how shaders are constructed some might occur. I tested my demos, but those do not cover everything.
Reporting: looks like it's impossible to switch on fxaa/msaa now: I see wide spaces between tiles and no blur/blending at the objects. textured with alpha-channel textures.
As for the fps: got 100 — 125 Vs 90 — 80 at the same scene, but fps looks more stable and does not drop down dramatically as before.
100 — 125
I hope 100 — 125 is with the new version :) Note that smooth shadows are now enabled by default. They require additional GPU power.
I hope 100 — 125 is with the new version :)
Yes, old version was 80 — 90. At the moments of fps drop, where old version could show 45 — 60, lowest at new one was ~73.
As for shadows — not in use in current project.
CPU time has been further lowered. Multicore could be used at the scene builder and I'm considering it, but we are talking about around 0.1-0.2 ms for common scenes. For rendering multicore the current OpenGL api is not sufficient, maybe I look into LÖVEs code, but not now. Instead I want to tackle small object combiners and similar, since they can increase performance a lot. I mean. A LOT. Could be done manually by the programmer but thats boring.