
`legion::Logger::error/fatal` doesn't stop the task.

Legion::Logger::error and fatal function calls don't stop the execution of a task. In addition, the legate executable returns normally even if an error happens inside a task.

Also, is there any guidance on error handling inside tasks? For instance, what's the best way to indicate a task is not running correctly? Or is there a way to emit an error message explaining misuse by the caller?

This appears to have been a conscious decision by @lightsighter and @streichler, so I'll let them comment.

I am not seeing this behavior; whether I insert a crash through an assertion or an uncaught exception, the exit code of the process is always 1:

Most user-caused (and thus recoverable) error conditions should get detected and reported before worker tasks gets launched. We typically use standard python exceptions to report such errors (ValueError, TypeError, ...), which the user code could ostensibly catch and recover from. The Legate library writer is responsible for checking and sanitizing inputs as much as possible, before they make it to the task body.

Sometimes a problematic input is only detected as part of running the computation (e.g. during a linear solve it is discovered that the matrix is singular). In that case the task should throw a special legate::TaskException, which will get caught and translated to a corresponding exception on the calling side, that the user could presumably catch and recover from. However there are a lot of caveats with this system, so it's preferable to catch user errors before the launch if possible.

Any remaining errors are internal errors, that the user cannot do anything about, and should just terminate the execution. You can either report an error on the appropriate logger then call the LEGATE_ABORT macro, or throw any exception besides legate::TaskException. I believe we are slowly transitioning to the latter.

Note that this is just my interpretation of our current practices, since this is not officially documented anywhere (and that's something we should fix). Inviting @magnatelee and @jjwilke to comment further.

That's correct. Logging infrastructure should not dictate how error handling is performed. The Realm logging infrastructure will report things at different levels including the error level, but it will not automatically error out your application. Error handling is the responsibility of the client. Legion too doesn't specify how you should have an error happening inside your tasks for the same reason: we don't want to force you into handling errors a particular way.

Actually, Realm has a change in the works that will result in fatal messages terminating the application:

The application has the ability to register a callback that Realm will call before terminating the application
(e.g. to save user data), but there's no way to "soldier on" from there.

The fatal message level though is different than the error message level right?

Correct. Realm uses error when it's going to continue operation, but is pretty certain that the application
is not going to produce the intended result.

Ah, I meant an error message is emitted through the logger.

In that case the task should throw a special legate::TaskException

Thank you for sharing.

Thank you all for the informative replies! I will close this issue now that it's clear.