decline-cookies/anvil-unity-dots

Move TaskDrivers to execute like OO Commands

Opened this issue · 3 comments

jkeon commented

TaskDrivers today work on the concept of doing everything they can to work on massive data-sets in parallel without having write contention.

The downside is that all data must be consolidated during a consolidation phase at the beginning of the frame which leads to at least a one frame delay if going from TaskDriverA to TaskDriverB.

To ensure consistency and prevent endless refactoring of ordering, we've made it so that it's always a one frame delay.

In practice we've seen that this isn't the most ideal way to use them. TaskDrivers tend to go deep with many Sub Task drivers to enact all the functionality required to complete a "Task".

This could lead to situations where it takes a noticeable amount of frames to get through the initial work to start doing something.

The change now is to have on-demand consolidation and run all TaskDriver jobs in order based on their Top level Task Driver.

All TopLevel TaskDrivers would get a chance to execute. They would schedule their jobs and so would their sub task drivers and so on down the chain. Developers would be able to order that scheduling as they know the flow of events to be able to efficiently burn through as many jobs as possible in one frame, only stopping when going to a job that will need to process the data over time or when complete.

Should we re-evaluate this as we get to use TaskDrivers in more situations?
I am a bit hesitant to remove the bulk processing efficiencies before we're sure that Task Drivers do in fact tend to go deep.

It may be that we have two variants or modes for them to operate in.

jkeon commented

Yeah sorry, I should have been more clear. I do think there are two variants.

Before we had Driver vs System.

Driver jobs are unique to the Driver instance. You have 10 TaskDrivers, there are 10 Driver jobs and we'd run all 10 of those at the same time but on the 10 different pieces of backing data.

If you have one driver with 10,000 elements, and the other 9 each had 1 element, you'd still have 10 separate jobs running with a pretty big imbalance.

So we offered the concept of System jobs. If the data was the same and the processing was the same, we could have 10 TaskDrivers that all wrote to the System and we'd have 10,009 elements in one backing data. We'd run one job to process which could efficiently split the 10,009 elements across however many cores were needed to handle the chunk size.

The downside that we're running into is that in order to do that, we have to choose (or have Unity choose for us) a point in time during the frame to execute a given type of TaskDriver/System combo's jobs. And that leads to the one frame delay for each time we try and convert an element into a different element.

I'll use Wander as an example.

  • WanderTaskDriver

    • TimerTaskDriver (Idle)
    • DecisionTaskDriver
    • PathfindTaskDriver
    • FollowPathTaskDriver
      • MoveTaskDriver
  • We come into Wander with a Driver Start

  • +1 frame

  • We kick off a Timer to wait an amount of seconds for idling

  • +1 frame

  • Timer starts counting down

  • +X amount of seconds

  • Timer completes

  • +1 frame

  • We request to make a decision on where to go

  • +1 frame

  • We get the result

  • +1 frame

  • We pathfind to that location

  • +1 frame

  • We get the result

  • +1 frame

  • We trigger a follow path

  • +1 frame

  • Follow Path triggers a Move

  • +1 frame

  • Move updates our position

  • +X amount of seconds

  • Move complete

  • +1 frame

  • FollowPath gets the complete and kicks off the next move

  • +1 frame

  • Move updates the position again

  • +X amount of seconds

  • Continue until done

  • FollowPath notifies of fully complete

  • +1 frame

  • Wander gets the complete and kicks off Timer idle again

  • +1 frame

  • REPEAT

All the +1 frame aspects add up and could go away.

The two variants I see right now are Driver and OverTime (Names are bad and TBD)

The regular Driver jobs are scheduled such that we can execute them all in one frame because they are all chained together. Because they are triggered at the same time as other Top Level Task Drivers they will all interleave with each other unless a specific shared dependency has one of their jobs waiting on the other.

When we hit an OverTime job, that's when we bucket everything into what System jobs were. These are the massively parallel processing jobs that execute once per frame to allow for something to occur over time. Ex. Counting down a timer, moving something at a certain speed over time etc.

All the Driver jobs get a chance to go ahead and write to these by executing first. Once all the Drivers are done, all the OverTime jobs get their chance to update once at the end of the frame. They might write back to other Drivers and that will get picked up the next frame.

Our Wander example becomes:

  • We come into Wander with a Driver Start
  • We kick off a Timer to wait an amount of seconds for idling
  • OVERTIME - Timer starts counting down
    • +X amount of seconds
  • Timer completes
  • We request to make a decision on where to go
  • We get the result
  • We pathfind to that location
  • We get the result
  • We trigger a follow path
  • Follow Path triggers a Move
  • OVERTIME - Move updates our position
    • +X amount of seconds
  • Move complete
  • FollowPath gets the complete and kicks off the next move
  • OVERTIME - Move updates the position again
  • Continue until done
  • FollowPath notifies of fully complete
  • Wander gets the complete and kicks off Timer idle again
  • REPEAT
jkeon commented

The current parenting is kind of gross:

https://github.com/decline-cookies/anvil-unity-dots/pull/247/files#r1190560198

As part of this, fix this API and how it works.