WebAssembly/stringref

Fallible UTF-8 decoding?

askeksa-google opened this issue · 2 comments

The UTF-8 decoding API in Dart has an allowMalformed option that specifies the behavior on malformed input. allowMalformed: true emits replacement characters for malformed portions. allowMalformed: false throws an exception on malformed input.

While allowMalformed: true corresponds directly to string.new_lossy_utf8, it seems difficult to implement allowMalformed: false without validating the input beforehand, which seems silly, since the decoding performs validation anyway.

Would it be possible to add another decoding instruction (e.g. string.new_utf8_try) that fails in a non-trapping manner on malformed input, for instance by producing null?

It's not important for the instruction to communicate why it failed, since the failing case is not considered performance critical, and thus any diagnostic information that the Dart API wants to provide about the failure can just be obtained by scanning the input separately, once the failure situation has been established.

Somewhat related to #19, where in general it seems that trapping operations aren't very useful. Returning null seems fine to me, so an appropriate language-level exception can be thrown as a result, or null returned. Seems preferable over letting the instruction throw, as it seems likely that such an exception would have to be caught and then replaced with a language-level exception anyway.

V8 has implemented this here:
string.new_utf8_try, opcode 0xfb8f.
The instruction traps (out of bounds) on invalid offset / length values.
The instruction returns null on any encoding errors.
Otherwise it is equivalent to string.new_utf8.