Potential scheduler bug? nextMsgIdx not updated when scheduling to other hosts
Closed this issue · 1 comments
I'm adapting my compute/storage division-aware scheduler patches for faasm-ndp to the new code, and I noticed that in the blocks of code responsible for scheduling off to other nodes when the current one is busy, the nextMsgIndex
used for calculations later is not updated - it was updated in the for loop stepping above, maybe it was missed in a copy-paste?
https://github.com/faasm/faabric/blob/master/src/scheduler/Scheduler.cpp#L287-L391
for (; nextMsgIdx < nLocally; nextMsgIdx++) { // for loop - nextMsgIdx updated
...
remainder -= nOnThisHost; // but nextMsgIdx not updated
From a code review standpoint it seems that there are multiple variables manually tracked: nextMsgIdx
, remainder
(which should just always be total minus nextMsgIdx), and the actual messages vector. The vector is not mutated, so that's not a problem, but I think replacing either nextMsgIdx
or remainder
with a formula based on the other one (you could put it in a capture-by-reference local lambda to avoid duplication) would eliminate this bug and also prevent making similar mistakes. Also marking locals like nMessages
, isThreads
, cores
as const
would make it clear these are not changing in the function.
const auto remainder = [&](){ return nMessages - nextMsgIdx; };
if (remainder() > 0) {...}
or
if (nextMsgIdx < nMessages) {...} // without remainder
Hmm yes, you're probably right on that one, this iteration of the scheduler is yet to be tested in anger so there are probably a few more issues with it that are yet to be discovered. Although the external interface won't change, the internals of it will get rewritten at least once over the next two or three weeks, so I don't want to spend too much time perfecting this specific implementation. Once the design and logic is stable, I'll certainly incorporate your comments on style and structure as they are very valid. We're running a few experiments this week that should flush out that bug and any others too.