Review and document batch jobs
Closed this issue · 2 comments
Algunenano commented
Things that need to be clarify:
- How many jobs can be run per user / host / server. There should be tests confirming that this works as such.
- The scheduler underlying architecture. How it syncs, when it needs locking and how they work.
- Crash tolerance: what happens with the jobs if a server crashes or shutdowns, which are 2 different cases.
- WIP Queues / index.
- Entry points and possible responses.
Some odd things I've seen that are totally unclear for me at the moment:
HostScheduler.prototype.add
does a lock but no unlock. It might be being unlocked automagically via ttl on in a different part of the code, which is bad and should be documented as such.JobBackend.prototype.clearWorkInProgressJob
checks how many ids the user has on their WIP list, but then always the user from WIP. If a user can only have 1 running job, then it should be an unconditional deletion, if not then it shouldn't remove the user unless all tasks are cleared.- Maintenance: There is a maintenance script to clean old jobs (older than 2 days), but it doesn't clear those jobs from their user WIP list and so on and it doesn't have any tests. It needs tests and use as many existing code as possible instead of doing direct redis calls for everything. If it doesn't need any other code, then it shouldn't be in the lib/ tree and be a separate entity that we can install and run separately. If possible I'd rather fix inconsistencies automatically in the code and not with an external code.
Algunenano commented
Also, since this part of the SQL API has clear boundaries we could take the time and update that part of the code to modern standards.
Algunenano commented
Closing stale ticket