Translation-specific errors should not crash the service
jelmervdl opened this issue · 5 comments
From the last call:
Should we be dying on HTML parse errors and such? These are query-level errors that should be handled better with an error response.
I proposed to add an ok
bool to response, similar to fetch()
's API: https://developer.mozilla.org/en-US/docs/Web/API/Response/ok
I tried something like this, e.g.:
struct Response {
std::string error;
bool ok() const { return error.empty(); }
};
// ...
EMSCRIPTEN_BINDINGS(response) {
class_<Response>("Response")
// ...
.property("ok", &Response::ok)
.property("error", &Response::error);
}
But implementing it in such a way that exceptions get caught and their what()
gets put into the correct response.error
, especially around the constructor of HTML
, is a bit more complicated. All methods that interact with the response
objects now need to check whether it is valid by checking its ok()
.
Alternative names:
- C++ also uses
good()
(e.g.iostream::good()
) - or
operator bool()
of course, but you can't translate that 1-to-1 to Javascript
I thought wasm at least was living in -fno-exceptions
territory where we can't throw and catch at all? (And even native code stuffed into the browser is compiled -fno-exceptions
). Which would imply using return codes / error states for constructed objects.
That is a good point: https://emscripten.org/docs/optimizing/Optimizing-Code.html#c-exceptions
In that case implementing this is a bit more more complicated, since things like the HTML class need to be altered a bit have a separate parse method that can somehow report error messages.
Things like model loading also reference ABORT_IF. Not sure how we should handle that. Preferably also in a recoverable way (no need to crash the whole service if one model failed to load, right?). Then, how do we report the error? Some sort of rust-like Result<RetType,ErrType>
return value? Or error messages by reference with some extra logic in the emscripten bindings to translate that to something Javascript understands?
If possible, I'd prefer to still keep exceptions working as is for the Python and C++ consumers of bergamot-translator. At least, in the places where exceptions make sense. We need something different for AsyncService anyway.
Some of the ABORT_IF
we can just keep IMHO. The biggest one is where input breaks stuff like an edge case in HTML.
Now low priority. This works fine in the production C++ API as is, and isn't blocking a WASM prototype.
To be clear, this was only meant to return an error in WASM when provided ill-formed HTML by the extension. The extension should never do that because it must use text mode for text/plain
content or pass innerHTML
which will be correctly formed.