Support toWtf8Array
wingo opened this issue · 4 comments
For languages that represent their strings as (array i8) internally, it would be good if there were an accelerated primitive for JS string -> wasm string conversion. In that case we would need toWtf8Array. encodeWtf8Array, as in stringref, would be the more general version, but toWtf8Array could be faster (doesn't require a prior measureWtf8) and covers some likely use-cases.
Generally +1 to adding instructions that are useful, so I'm supportive of this request.
I don't feel strongly about the details, and don't think they're worth worrying about: if we go for toWtf8Array, then that would internally have to do the equivalent of measureWtf8 before allocating the result array, so I doubt that it would actually be faster. Due to needing fewer instructions, it would be shorter to encode though. As a potential minor drawback, it would probably be the first operation that returns a WasmGC array without expressing the type of that array in the Wasm module (only in the specification); I don't know if that's a concern.
Actually, when measuring the size of the result array, if one detects that the string contains only ASCII characters, one can directly use memcpy to initialize the array. (For that, one can just compare the length of the JavaScript string and the number of bytes required to encode the string.) To do the same optimization with encodeWtf8Array, one would need to scan the string again.
Indeed, we need a Wasm GC array type which is not associated to any module. Current Wasm GC implementations probably does not expect that.
That's totally fine, the meaning of structural types is independent of modules, resp. module instances.
I believe this is now fixed with the introduction of the text-encoder/text-decoder namespaces.