Can encoding instructions use memory as a scratch space?
wingo opened this issue · 2 comments
If, in the implementation of stringrefs in your wasm VM, you have a managed buffer of WTF-8, and the user requests that you write UTF-8 to memory via string.encode_lossy_utf8
, one tactic would be to just memcpy the whole thing, and then go back and change any surrogate to be U+FFFD. (Not saying it's a good strategy, just a possible strategy.) In a single-threaded world, this is fine. Would it be fine with threads? See WebAssembly/threads#189.
IIUC, the complication here in comparison to memory.fill
would be the potential for a racing thread to observe the unsanitised surrogate before it's overwritten. We could write the specification for string.encode_lossy_utf8
so that this additional behaviour is permitted without too much trouble if this is a desirable implementation to support (i.e. a racing thread could see arbitrary interleavings of the old data, the unsanitised new data, and the sanitised new data).
if this is a desirable implementation to support
It's not a big issue either way, but the single-memcpy-plus-fixups implementation is a nice simplification compared to the alternative, so yeah, it would be nice (but not crucial) to support it. You can see the difference here (lines 1274 and following).