browsermt/bergamot-translator

Translation-specific errors should not crash the service

jelmervdl opened this issue · 5 comments

From the last call:

Should we be dying on HTML parse errors and such? These are query-level errors that should be handled better with an error response.

I proposed to add an ok bool to response, similar to fetch()'s API: https://developer.mozilla.org/en-US/docs/Web/API/Response/ok

I tried something like this, e.g.:

struct Response {
 std::string error;
  bool ok() const { return error.empty(); }
};

// ...

EMSCRIPTEN_BINDINGS(response) {
  class_<Response>("Response")
     // ...
      .property("ok", &Response::ok)
      .property("error", &Response::error);
}

But implementing it in such a way that exceptions get caught and their what() gets put into the correct response.error, especially around the constructor of HTML, is a bit more complicated. All methods that interact with the response objects now need to check whether it is valid by checking its ok().

Alternative names:

  • C++ also uses good() (e.g. iostream::good())
  • or operator bool() of course, but you can't translate that 1-to-1 to Javascript
kpu commented

I thought wasm at least was living in -fno-exceptions territory where we can't throw and catch at all? (And even native code stuffed into the browser is compiled -fno-exceptions). Which would imply using return codes / error states for constructed objects.

That is a good point: https://emscripten.org/docs/optimizing/Optimizing-Code.html#c-exceptions

In that case implementing this is a bit more more complicated, since things like the HTML class need to be altered a bit have a separate parse method that can somehow report error messages.

Things like model loading also reference ABORT_IF. Not sure how we should handle that. Preferably also in a recoverable way (no need to crash the whole service if one model failed to load, right?). Then, how do we report the error? Some sort of rust-like Result<RetType,ErrType> return value? Or error messages by reference with some extra logic in the emscripten bindings to translate that to something Javascript understands?

If possible, I'd prefer to still keep exceptions working as is for the Python and C++ consumers of bergamot-translator. At least, in the places where exceptions make sense. We need something different for AsyncService anyway.

kpu commented

Some of the ABORT_IF we can just keep IMHO. The biggest one is where input breaks stuff like an edge case in HTML.

kpu commented

Now low priority. This works fine in the production C++ API as is, and isn't blocking a WASM prototype.

kpu commented

To be clear, this was only meant to return an error in WASM when provided ill-formed HTML by the extension. The extension should never do that because it must use text mode for text/plain content or pass innerHTML which will be correctly formed.