z0w0/helm

Only re-render if the model has changed

z0w0 opened this issue · 16 comments

z0w0 commented

During the development of Helm 1.0, the sample mechanics introduced in previous versions (which prevented excess rendering) was removed. To optimize rendering, the render function should be changed to require an Eq implementation on the engine user's model type and then check the model is not the same before rendering.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

z0w0 commented

This was done by simply marking the model as dirty when the game model is changed by an action, and then this dirty flag is turned off after being rendered. Better than requiring an Eq constraint.

I don't think you have solved this unfortunately, both the flappy and the restored hello example peg one CPU core at 100% while idle.

z0w0 commented

Hmm. I'll investigate this weekend. Happy if someone beats me to it :)

It may not be re-rendering but polling for events in a really tight loop. I did some profiling with the hello example:

stack install --profile
stack exec -- ghc -prof -fprof-auto -rtsopts examples/hello/Main.hs
examples/hello/Main

Which gives a Main.prof file:

	Wed Jun 28 15:08 2017 Time and Allocation Profiling Report  (Final)

	   Main +RTS -p -RTS

	total time  =        3.29 secs   (3293 ticks @ 1000 us, 1 processor)
	total alloc = 938,042,536 bytes  (excludes profiling overheads)

COST CENTRE            MODULE                 SRC                                             %time %alloc

tick                   Helm.Engine.SDL.Engine src/Helm/Engine/SDL/Engine.hs:(129,3)-(141,37)   35.0    1.2
pollEvent.\            SDL.Event              src/SDL/Event.hs:(679,37)-(683,43)               24.0    0.0
superstep.loop         FRP.Elerea.Param       FRP/Elerea/Param.hs:(167,5)-(176,64)             19.0   53.4
step                   Helm                   src/Helm.hs:(67,1)-(85,65)                        5.2    8.7
externalMulti.\.sample FRP.Elerea.Param       FRP/Elerea/Param.hs:469:18-89                     2.7    8.7
start.\                FRP.Elerea.Param       FRP/Elerea/Param.hs:(156,22)-(160,14)             2.4    0.0
pollEvent              SDL.Event              src/SDL/Event.hs:(679,1)-(683,43)                 2.4    9.9
initialize             SDL.Init               src/SDL/Init.hs:(61,1)-(63,39)                    1.5    0.0
superstep              FRP.Elerea.Param       FRP/Elerea/Param.hs:(164,1)-(176,64)              1.4    3.1
moves                  Helm.Mouse             src/Helm/Mouse.hs:(23,1)-(26,42)                  1.1    2.5
start                  FRP.Elerea.Param       FRP/Elerea/Param.hs:(152,1)-(160,14)              1.1    3.1
superstep.deref        FRP.Elerea.Param       FRP/Elerea/Param.hs:166:5-53                      0.4    3.1
addSignal.fin.\        FRP.Elerea.Param       FRP/Elerea/Param.hs:(190,37)-(192,56)             0.2    3.1
externalMulti.\        FRP.Elerea.Param       FRP/Elerea/Param.hs:(467,27)-(470,69)             0.1    1.2
memoise                FRP.Elerea.Param       FRP/Elerea/Param.hs:265:1-64                      0.1    1.9
z0w0 commented

I'd thought that the SDL pollEvent wouldn't be intensive.
What exactly are the possible solutions to this?
Should there be a thread delay, perhaps?

z0w0 commented

Based off the documentation for SDL.Time.Event, I'm inclined to believe we shouldn't be using any SDL2 API wait implementations but instead should just block the thread using GHC's threadDelay (especially since the game loop is on the Haskell end). I think adding two new game configuration options (with sane defaults) will suffice: updateLimit a millisecond unit delay for sleeping the thread (I think 1ms would be a fine default) and fpsLimit a soft-limit to the FPS the game should render frames at

There might be games where input lag is acceptable for the gain of minimised CPU usage, so the dev may opt to increase the updateLimit.

Yeah, I think you are right about that although it's not explicitly documented.

z0w0 commented

Been under the weather and haven't had a chance to look at this yet and am going on holidays for a week tomorrow. Happy to Paypal $10 AUD through to any legend that fixes this before I get back :)

In particular:

  • Add the soft-limit to the rendering rate as two enum variants, data FPSLimit = Unlimited | Limited Int (defaults to 120FPS)
  • Add the hard-limit to the engine tick with threadDelay as two enum variants like the above, data UpdateLimit = Unlimited | Limited Double where the unit here is either milliseconds or microseconds (threadDelay takes microseconds IIRC)

These two should both go on the SDL engine config, or game config. Both would work, but game config might be more appropriate because those two settings won't be specific to the SDL engine.

P.S. I realised two other things related to this issue:

  • It's only polling one event every engine tick - it should be polling all of the events available in a single tick otherwise it pumps the game actions out of the subscriptions slowly every tick, and commands get run much more than they should..
  • Exposed events aren't re-rendering the engine

This one is somehow connected to #118. I implemented FPSLimit Limited and flickering disappeared. I am not proud by an implementation yet as it was POC, so I may hold it for a little, but as soon as it will be done I will provide a PR. I plan to deliver it in two separate PRs. One for FPSLimit and another one for UpdateLimit. Issues mentioned at very bottom of previous comment will require a separate attention as well.

@z0w0 regarding GameConfig and using GameConfig name for FPSLimit and UpdateLimit. What options do you have in mind? I have an idea to name it GameLifecycle as it contains all key functions that are defining game specific. Any other ideas?

@z0w0 There is an interesting interdependency. If UpdateLimit is bigger than a FPSLimit it will override it, as there is DirtyModel as well. Also I think there is no reason to update more often than a FPSLimit, as it will not provide any benefit anyway, user will not see a result. Is there an option to simplify these DirtyModel, FPSLimit, UpdateLimit and what are the benefits to have such a granular configuration?
ps Thinking loud.

Our current game loop implementation has following constraints:

  • DirtyModel — indicates that model was changed. If it is not, there is no reason to render, since new frame will not differ from a previously rendered one.
  • FPSLimit — in case we want to safe a resource even if model updated, can be helpful on a constrained hardware like mobile devices.

However, the current implementation anyway consumes all the CPU time given, since as we skip frame rendering we do SDL events polling and processing all the time nonstop. Moreover with a naive implementation where model changes on every SDL event and FPS is Unlimited processing of the game directly depends on hardware speed. The idiomatic way of implementation right now depends on Time.fps in subscriptions, that generates update actions. This allows generating updates on a regular basis that take delta time as an input for an interpolation. While it solves the problem of uncontrollable timing of game state updates, it does not solve an issue with over-utilization of CPU, as events still polled every second.

The classic game loop will looks like following:

loopStartTime <- …
loopUpdate updateLimit deltaTime
render
delayToCatchupWithFPSLimit loopStartTime

In our implementation we will not need to calculate deltaTime to pass into the function, as it is up to the developer to use Time signals in his implementation. However, we need to ensure that overall events polling constrained if needed. This is done by introducing UpdateLimit that constrains the step function as a whole in the following way:

loopStartTime <- …
updateAsMuchAsUpdateLimitAllows
renderIfDirty
waitUpToFPSLimit

Resulting implementation kit not much different from a classic game loop.

There are several problems with this implementation that may require further thinking:

  • Update + Render time can be above FPSLimit. This will cause frames skip to happen, but still a smooth update of game logic. We may redefine prioritization in a way that FPS is constant, but game updates happen less often. It may affect physics accuracy. We need to spend additional time on drawbacks investigation of such approach.
  • UpdateLimit can be lower than a Timer.fps events within a game implementation, as result the actual updates will be less than requested one.

This is what I am going to implement. If anyone has any suggestions for improvement, please chime in.

z0w0 commented

Sounds exactly like what I was invisioning. In an ideal world, update would be on its own green thread and rendering stays on the main thread. But that takes a bit of involvement, because some commands need to be on the main thread (things that interact with the underlying SDL window for example).

Ok, I got POC working but have challenges now with propagating quit state for engine across loop functions. Profiling does show much better numbers, I am going to polish the code, implement a better propagating of quit state and provide a PR.

@z0w0 you mentioned "It's only polling one event every engine tick", correct me if I am wrong but tick function is a recursive function.

-- Sink everything else into the signals
      Just Event.Event { .. } ->
        sinkEvent engine eventPayload >>= tick

By the end it will sink all the events before proceeding to processing actions. Am I missing something there?

Also can you clarify "Exposed events aren't re-rendering the engine", not sure I am understand this part.

z0w0 commented

Yep, you're right that the poll is correct. I misread the code at the time :)

Exposed events are emitted by SDL when a window above the game window is moved out of the way, i.e. the contents of the game window are re-exposed. We need to re-render the game window as soon as this exposed event is emitted so that there's no delay in updating the screen. This complicates the code you've been working on a bit..