exception in service.py causes tasks to be skipped
ultra-sonic opened this issue · 7 comments
Hi Timur,
we are currently running into an issue where exceptions (like IOError [Errno 28] No space left on device) cause a task to be skipped and want to ask if this would be the right place to fix this:
cgru/afanasy/src/render/taskprocess.cpp
Line 113 in 1634d57
or should this rather be done here:
cgru/afanasy/src/server/taskrun.cpp
Line 154 in 1634d57
I cant find any other reference to af::TaskExec::UPSkip so I would propose setting the status to AFJOB::STATE_ERROR_MASK instead of AFJOB::STATE_SKIPPED_MASK in this case.
I also cannot imagine a case where an empty command would be sent on purpose.
do you think this is safe to do?
Hi, Oliver!
Zero command size (an empty string) is designed to allow service to decide to skip a task for some reason.
When service check something and want to ask server to skip the task, it can set command = "".
So if your service clears command for any reason, task will be skipped.
If you want make service to set ERROR on en empty command, you can change UPSkip
to UPFailedToStart
in this line:
cgru/afanasy/src/render/taskprocess.cpp
Line 113 in 1634d57
May be i should make service to skip task in some other way, not by an empty command.
Also it will be good to have some service initialization check, an on its fail send some UPFailedToInitializeService
.
Thanks for the quick reply Timur!!! Much appreciated!
May be i should make service to skip task in some other way, not by an empty command. Also it will be good to have some service initialization check, an on its fail send some
UPFailedToInitializeService
.
This would be great to have for sure!
If you want make service to set ERROR on an empty command, you can change
UPSkip
toUPFailedToStart
will af::TaskExec::UPFailedToStart really be treated as an ERROR and avoid the failing rendernode? i dont see this case setting m_progress->state = m_progress->state | AFJOB::STATE_ERROR_MASK;
Not on server, on render in this line:
https://github.com/CGRU/cgru/blob/master/afanasy/src/render/taskprocess.cpp#L121
I already added sevice initialization check. May it will solve your problem.
The next step (today-tomorrow) i will create a special function for service to skip (not by zero command).
And on zero command it will be UPFailedToStart (error for server).
You can already test it in production.
There was not any changes on server side.
All changes just on render side, an not in critical parts.
Tomorrow i will update our farm.
Great news Timur! Thanks for the quick fix! You're the man!