WebAssembly/stringref

Can encoding instructions use memory as a scratch space?

wingo opened this issue · 2 comments

wingo commented

If, in the implementation of stringrefs in your wasm VM, you have a managed buffer of WTF-8, and the user requests that you write UTF-8 to memory via string.encode_lossy_utf8, one tactic would be to just memcpy the whole thing, and then go back and change any surrogate to be U+FFFD. (Not saying it's a good strategy, just a possible strategy.) In a single-threaded world, this is fine. Would it be fine with threads? See WebAssembly/threads#189.

IIUC, the complication here in comparison to memory.fill would be the potential for a racing thread to observe the unsanitised surrogate before it's overwritten. We could write the specification for string.encode_lossy_utf8 so that this additional behaviour is permitted without too much trouble if this is a desirable implementation to support (i.e. a racing thread could see arbitrary interleavings of the old data, the unsanitised new data, and the sanitised new data).

if this is a desirable implementation to support

It's not a big issue either way, but the single-memcpy-plus-fixups implementation is a nice simplification compared to the alternative, so yeah, it would be nice (but not crucial) to support it. You can see the difference here (lines 1274 and following).