Show currently running job
Closed this issue · 2 comments
I think it would be helpful both from observability and troubleshooting point of view if it was possible to see the currently executing job in ecChronos using ecctool.
I propose that ondemand/schedules has an additional state called "RUNNING". This state should only exist for 1 job at a time per ecChronos instance. When the scheduler is about to execute the job, it should set the state to RUNNING.
What is the difference between status:Started and status:Running?
Complicated answer ahead. For the TLDR skip the the end.
A repair is split into lots of little repairs (subrepairs). Each repair is vying with priority to run it's sub repairs. Status are on multiple aspects of a repair. The fully encapsulated repair itself has a state (ON_TIME, OVERDUE, etc) but each small piece (subrange) also has a state (Started, Finished, Failed)
Those sub repairs are what actually runs the repair towards the database. The proposed running state would be the state a large repair would be in if it's in the process of running one of it's subranges.
TLDR:
Started is saved in repair history to denote when a subrepair has begun. However, currently only the status of the entire job is propagated to the user with ecctool. The proposed RUNNING status would be the status of the entire job to denote it's ranges are being run.
It might be worth considering to not create a new state and instead have a different command that shows the currently executing repairs. Kinda like system_view.queries does with ongoing queries in Cassandra.