WebAssembly/design

Builtins and a JS String API

eqrion opened this issue · 42 comments

Today it’s possible to use any JavaScript or Web API by importing JS ‘glue’ code which adapts between WebAssembly and JavaScript values and calling convention. Usually, this has a negligible performance impact and work has been done to optimize this wherever we can.

However, this glue code can have a significant impact on primitives such as JavaScript Strings, ArrayBuffers, RegExp, Map, and BigInt where the overhead of calling an unknown import can be too much on common operations.

With compile-time-imports, it is possible for Web engines to specialize to a known-import when compiling a module. With type-imports, we can omit type checks in certain circumstances. This proposal builds off both to describe ‘builtins’ which can be used by these proposals for efficient access to Web primitives. The name is subject to change, see the bottom.

Do we need builtins?

One interesting option would be for engines to pattern match on import values to well-known API’s. You could imagine an engine recognizing an import to String.prototype.charCodeAt and emitting efficient code generation for it.

The main problem with this is that existing API’s require a calling convention conversion to handle differences around the this value. The WebAssembly.Function interface (from the js-type-reflection proposal) could be extended to allow specifying the JS receiver.

However, then there is a new question around specifying shareability, which is required for the compile-time-imports proposal. It also seems desirable to limit the depth of any pattern matching to at most a single level of value. This simplifies engines and prevents performance cliffs when the import value is not in the right shape.

A secondary problem is that certain existing JS-API’s have some inefficiencies when used directly. String methods, for example, accept either a string object wrapper or string primitive value. Specializing directly to these methods would require faithfully duplicating this logic, even if it’s unnecessary.

For these reasons, I think a design that defines new definitions that can be imported and tailored to WebAssembly's needs is cleaner.

Builtins overview

A builtin is a definition on the WebAssembly namespace that can be imported by a module and provides efficient access to a JavaScript or Web primitive. There are two types of builtins, functions and types. Type builtins are only available with the type-imports proposal.

Builtins do not provide any new abilities to WebAssembly. They merely wrap existing primitives in such a manner that WebAssembly can efficiently use them.

The standardization of builtins will be governed by the WebAssembly standards process and would exist in the JS-API document.

The bar for adding a new builtin would be that it enables significantly better code generation for an important use-case beyond what is possible with a normal import.

Function builtins

Function builtins would be an instance of WebAssembly.Function and have a function type.

Their behavior would be defined using algorithmic steps similar to the WebIDL or EcmaScript standards. If possible, we could define them using equivalent JavaScript source code to emphasize that these do not provide any new abilities.

Type builtins

Type builtins would be an instance of the WebAssembly.Type interface provided by the type-imports proposal. The values contained in a type builtin would be specified with a predicate.

WebAssembly.String API

The most pressing use-case is a String builtin API. The following is a sketch of a potential API. The function definitions use a hybrid of wasm and JS syntax, as they adapt between the two worlds. We would need to use something more formal for a final definition.

namespace WebAssembly.String {
    WebAssembly.BuiltinType type;

    WebAssembly.Function fromCharCode;
    WebAssembly.Function fromCodePoint;
    WebAssembly.Function codePointAt;
    WebAssembly.Function charCodeAt;
    WebAssembly.Function length;

    WebAssembly.Function concat;
    WebAssembly.Function slice;

    WebAssembly.Function equals;

    WebAssembly.Function encodeIntoMemoryUTF8;
    WebAssembly.Function decodeFromMemoryUTF8;
}

WebAssembly.String.type

An importable type that refers to JavaScript string primitives. This intentionally excludes JavaScript string wrapper objects.

function WebAssembly.String.type.includes(value) {
	return typeof value === ‘string’;
}

WebAssembly.String.fromCharCode

function WebAssembly.String.fromCharCode(charCode: i32) -> (ref null String.type) {
	codePoint = ToJSValue(charCode);
	let result = String.fromCharCode(charCode);
	return ToWebAssemblyValue(result, ‘ref null String.type’);
}

WebAssembly.String.fromCodePoint

function WebAssembly.String.fromCodePoint(codePoint: i32) -> (ref null String.type) {
	codePoint = ToJSValue(codePoint);
	let result = String.fromCodePoint(codePoint);
	return ToWebAssemblyValue(result, ‘ref null String.type’);
}

WebAssembly.String.codePointAt

function WebAssembly.String.codePointAt(
string: (ref null String.type),
index: i32
) -> i32 {
	string = ToJSValue(string);
	index = ToJSValue(index);
	let result = string.codePointAt(index);
	return ToWebAssemblyValue(result, ’i32’);
}

WebAssembly.String.charCodeAt

function WebAssembly.String.charCodeAt(
string: (ref null String.type),
index: i32
) -> i32 {
	string = ToJSValue(string);
	index = ToJSValue(index);
	let result = string.charCodeAt(index);
	return ToWebAssemblyValue(result, ‘i32’);
}

WebAssembly.String.length

function WebAssembly.String.length(
string: (ref null String.type)
) -> i32 {
	string = ToJSValue(string);
	let result = string.length;
	return ToWebAssemblyValue(result, ‘i32’);
}

WebAssembly.String.concat

function WebAssembly.String.concat(
first: (ref null String.type),
second: (ref null String.type)
) -> (ref null String.type) {
	first = ToJSValue(first);
	second = ToJSValue(second);
	let result = first.concat(second);
	return ToWebAssemblyValue(result, ‘ref null String.type’);
}

WebAssembly.String.slice

function WebAssembly.String.slice(
string: (ref null String.type)
indexStart: i32,
indexEnd: i32,
) -> (ref null String.type) {
	string = ToJSValue(string);
	indexStart = ToJSValue(indexStart);
	indexEnd = ToJSValue(indexEnd);
	let result = string.slice(indexStart, indexEnd);
	return ToWebAssemblyValue(result, ‘ref null String.type’);
}

WebAssembly.String.equals

function WebAssembly.String.equals(
first: (ref null String.type),
second: (ref null String.type)
) -> i32 {
	first = ToJSValue(first);
	second = ToJSValue(second);
	let result = first === second ? 1 : 0;
	return ToWebAssemblyValue(result, ’i32’);
}

WebAssembly.String.encodeIntoMemoryUTF8

function WebAssembly.String.encodeIntoMemoryUTF8(
string: (ref null String.type),
startIndex: i32,
memory: (ref null WebAssembly.Memory.type)
) -> [i32, i32] {
	string = ToJSValue(string);
	startIndex = ToJSValue(startIndex);
	memory = ToJSValue(memory);
	let encoder = new TextEncoder();
	let dest = new Uint8Array(memory.buffer).subarray(startIndex);
	let result = encoder.encodeInto(string, dest);
	let read = result.read;
	let written = result.written;
	return [
		ToWebAssemblyValue(read, ’i32’),
		ToWebAssemblyValue(written, ’i32’)
	];
}

WebAssembly.String.decodeFromMemoryUTF8

function WebAssembly.String.decodeFromMemoryUTF8(
startIndex: i32,
lengthBytes: i32,
memory: (ref null WebAssembly.Memory.type)
) -> (ref null String.type) {
	startIndex = ToJSValue(startIndex);
	lengthBytes = ToJSValue(lengthBytes);
	memory = ToJSValue(memory);
	let decoder = new TextDecoder();
	let dest = new Uint8Array(memory.buffer).subarray(startIndex, lengthBytes);
	let result = decoder.decode(dest);
	return ToWebAssemblyValue(result, ’ref null String.type’);
}

Open Questions

Is builtin the best name?

The name ‘builtin’ may mislead some people into thinking these functions expose new abilities not available to JavaScript. They are actually much closer to the concept of ‘adapters’, however that name is currently in use by the component model.

Where should builtins be defined?

Defining a sub-namespace for each builtin is clean, but risks confusion that these are actually JavaScript classes. It would be nice to avoid questions of whether instanceof WebAssembly.String is valid or not.

wingo commented

I think that builtins can be a great part of the toolkit for allowing a WebAssembly implementation to accelerate compilation of host-provided primitives. Thinking about it a bit I see builtins this way, relative to the state of the art:

Neutral

  1. For charCodeAt et al, currently we have to use Function.bind and so on: WebAssembly/stringref#5 (comment). This works fine as long as the functionality is of the right shape already.

  2. Builtins are independent of compile-time imports or pre-imports.

    If compile-time imports are not available, one strategy that can be used is to only specialize on builtins for the optimizing tiers, and have the baseline tier call out to builtins. Then you can cache both baseline code and optimized code, and if a subsequent compile doesn't have the expected builtins, you throw away just the optimized code. Compile-time or pre-imports remove this source of dynamism, but aren't necessary for the builtins subproposal.

Pros

  1. It's nice to be able to expand the set of builtins without e.g. noodling on the String object; having a place for builtins is good.

Cons (builtins)

  1. Builtins are an addition to the web platform, which comes with its own externalities. Presumably users can call them from JS also.

Cons (for the concrete set of builtins being proposed above for strings)

  1. Using this set of builtins effectively standardizes JavaScript strings. This has a number of drawbacks:

    1. Source languages that want to use WebAssembly strings would not have them available on implementations where JS is not the host. Kotlin would have to compile in at least two different ways, and you are still stuck with the "how do I use a host regexp engine" problem.
    2. Source languages that want Wasm strings but don't care about UTF-16 (e.g. Scheme) are forced to access string contents through expensive codepointAt operations. This can turn O(n) loops into O(n^2) loops.
  2. Relative to a system with views like stringref, the cost model is not as expressive for a source compiler. For example with the stringref proposal, a source compiler can produce code that gets a view outside a loop and then access the view inside a loop. In V8 for example we eagerly flatten strings when getting a UTF-16 view and unpack some raw element and length pointers. But for a string that just passes through wasm or which is accessed in some other way, no eager flattening is needed.

  3. No GC memory builtins (to/from WTF-16/WTF-8 arrays in GC memory) in the set of builtins above.

  4. No iterator interface in the set of builtins above.

  5. UTF-8 interfaces lack expressivity relative to e.g. string.new_utf8 / string.new_wtf8/ string.new_lossy_utf8 and related functionality.

Thoughts

If the string problem is worth solving, it is probably worth solving not just for source languages that treat strings as UTF-16, and in a way that can be implemented on wasm outside JS hosts. I think that if this proposal ships as-is it runs the risk of being superseded in some years by some more expressive API, while at the same time probably delaying that more expressive API.

Thanks for this feedback, responses inline.

1. For `charCodeAt` et al, currently we have to use `Function.bind` and so on: [Library functionality: toUpperCase, etc stringref#5 (comment)](https://github.com/WebAssembly/stringref/issues/5#issuecomment-1243487288).  This works fine as long as the functionality is of the right shape already.

Oh, I see, you can combine Function.call with Function.bind and import that.

So that works with receivers, but it doesn't work with other cases such as JS operators (===) or more macro operations like the text encode/decode.

Another concern I have with this is the potential for the pattern matching to be slightly incorrect and engines to miss out specializing to something they should. With builtins, I think a good expectation would be that if they are available and you compile-time import them that the engine must specialize to them.

There's also still the question of shareability, if this import value is exported how does it materialize on other Web Workers? Which JS global definitions can you send this way?

2. Builtins are independent of compile-time imports or pre-imports.
   If compile-time imports are not available, one strategy that can be used is to only specialize on builtins for the optimizing tiers, and have the baseline tier call out to builtins.  Then you can cache both baseline code and optimized code, and if a subsequent compile doesn't have the expected builtins, you throw away just the optimized code.  Compile-time or pre-imports remove this source of dynamism, but aren't necessary for the builtins subproposal.

Yeah, agreed that doing this specialization isn't impossible without compile-time imports. But relying on that technique would force engines to use a baseline tier even if they don't deem it profitable due to the module being small (such as for JITs).

Cons (builtins)

1. Builtins are an addition to the web platform, which comes with its own externalities.  Presumably users can call them from JS also.

Do you have specific externalities in mind?

I agree that adding a new concept is not to be taken lightly. The main risk in my opinion is that it becomes an over-used extension point for stuffing in new things to the web platform. A mitigation for that would be to force all builtins to be defined in terms of existing JS primitives and demonstrate a clear performance benefit over normal imports. These two things make builtins sort of an adaptation layer.

Cons (for the concrete set of builtins being proposed above for strings)

1. Using this set of builtins effectively standardizes JavaScript strings.  This has a number of drawbacks:
   
   1. Source languages that want to use WebAssembly strings would not have them available on implementations where JS is not the host.  Kotlin would have to compile in at least two different ways, and you are still stuck with the "how do I use a host regexp engine" problem.

This is an important point that probably needs more discussion.

My personal opinion is that the Web is unique and difficult enough to target that any serious port will do special things on the Web compared to on non-JS hosts. You already need special builds for how you interface with API's (Web vs component-model vs hand-written ABI's). I don't think it's unreasonable for there to be special builds for using host provided types when they match up with your source language types. And simple ports wouldn't need to do this, they could just get by with copies on the boundary if they wanted to get a quick initial release out.

I do think there is an interesting space around polyfills here. You could imagine a JS string builtin polyfill that's applied as a toolchain-agnostic build step and could help minimize work for toolchains when porting off of the Web. This would be analogous to how some folks want to polyfill WASI on the Web in certain circumstances.

   2. Source languages that want Wasm strings but don't care about UTF-16 (e.g. Scheme) are forced to access string contents through expensive `codepointAt` operations.  This can turn O(n) loops into O(n^2) loops.

What is Scheme using in the stringref proposal that's missing here? Would having a builtin around String.prototype[@@iterator]() solve this?

2. Relative to a system with views like `stringref`, the cost model is not as expressive for a source compiler.  For example with the stringref proposal, a source compiler can produce code that gets a view outside a loop and then access the view inside a loop.  In V8 for example we eagerly flatten strings when getting a UTF-16 view and unpack some raw element and length pointers.  But for a string that just passes through wasm or which is accessed in some other way, no eager flattening is needed.

Does stringref have a cost model? My understanding from WebAssembly/stringref#56 is that it does not, and that was seen as desirable. For example, with string.concat, source language compilers would have to code to specific implementations for the best way to use that instruction.

3. No GC memory builtins (to/from WTF-16/WTF-8 arrays in GC memory) in the set of builtins above.

4. No iterator interface in the set of builtins above.

These are good points, I skipped them due to time constraints but I'll see if I can come up with something.

5. UTF-8 interfaces lack expressivity relative to e.g. `string.new_utf8` / `string.new_wtf8`/ `string.new_lossy_utf8` and related functionality.

What do you mean by expressivity here? Is there a string constant that's harder to generate using JS string literals?

Thoughts

If the string problem is worth solving, it is probably worth solving not just for source languages that treat strings as UTF-16, and in a way that can be implemented on wasm outside JS hosts. I think that if this proposal ships as-is it runs the risk of being superseded in some years by some more expressive API, while at the same time probably delaying that more expressive API.

From the meeting on Tuesday it sounds like there are two separate string problems:

  1. Efficient access to host API's that use strings
  2. Portable string implementation for toolchains to use for their source level string

There's a conflict in these goals though, because the most efficient thing for (1) is going to be different per-host leading to non-portability in (2).

stringref solves this by making the optimal thing for the Web (JS strings) portable as core wasm instructions.

I think instead that we should allow for optional specialization to hosts when they need this kind of performance, and give up some amount of portability. I don't think this loss is that significant, because toolchains already need separate builds for Web/non-Web. And this allows us to go further than just Strings to things like UInt8Array.

wingo commented

Greets :) At the risk of going back-and-forth, here's a long reply. I appreciate the time you have put in thinking about this problem and your willingness to explore the design space, and I am trying to keep an open mind on my side as well.

Oh, I see, you can combine Function.call with Function.bind and import that. That works with receivers, but it doesn't work with other cases such as JS operators (===) or more macro operations like the text encode/decode.

Hum, good points, though for === you could bind Object.is with string-typed parameters. Of course any functionality that's not in the web platform would have to be added to the web platform.

There's also still the question of shareability, if this import value is exported how does it materialize on other Web Workers? Which JS global definitions can you send this way?

I guess the only way would be to send it "by name", which isn't transparent as regard exports. I just meant to point out that there are solutions.

Do you have specific externalities in mind?

Just referring to on-going maintenance effort needed on additional capabilities exposed to JS. There is also the extra cost of course of making an interface that is JS-affine, whereas something made for wasm might have a different shape.

My personal opinion is that the Web is unique and difficult enough to target that any serious port will do special things on the Web compared to on non-JS hosts.

Yeah, no argument that there will be declinations for different platforms. To me the question is, what do we want it to look like when targetting e.g. Java/C#/Kotlin/* to wasm, regardless of target?

To my mind you would want the ability for these languages to use host-provided string libraries: ICU, regexp, and so on, and to have known-callee optimizations on those facilities. This applies to the web, which your builtins proposal targets, but also on non-web.

Of course if a target platform has a more minimal approach, sure, I agree there is space for a toolchain to polyfill a module's use of a string facility, shipping ICU and having regexp without JIT. But for reasons of binary size and ultimate throughput I would want to use host strings if I could.

If we focus on the web use case, I think we end up requiring this second "bundling" approach for non-web. I think it's less good than allowing for host-provided string support.

On the other hand if a non-web wasm implementation ends up providing a host string facility, we have three possibilities I think. One is that it is exactly like JS strings; that works for Java, but is exactly what we wanted to avoid by pushing strings out to builtins. Another is that it is more tailored to another string representation, e.g. Rust's; then it is not so useful for Java and related languages. Or, it could end up implementing something like stringref, which introduces some dynamism to allow source language compilers to express how they would like to access string contents.

What is Scheme using in the stringref proposal that's missing here? Would having a builtin around String.prototype[@@iterator]() solve this?

Buf, things are early days, I don't mean to make too strong a statement here. But yes wrapping the String iterator would help, it would also help to be able to advance an iterator by an arbitrary number of codepoints.

Does stringref have a cost model? My understanding from WebAssembly/stringref#56 is that it does not, and that was seen as desirable. For example, with string.concat, source language compilers would have to code to specific implementations for the best way to use that instruction.

The main model is that if you obtain a view, you then have the fastest access to contents. I know it's not precise, but it's something.

I think this is what I was getting at in my comment in the meeting -- let's imagine we leave string.concat out because we want to avoid cementing the JS string cost model. In fact let's leave out most all operations that actually access string contents. Instead we import these operations from the host, which is JS on the web. But engines will specialize appropriately, with the end situation that users expect the JS string de-facto cost model. So what would we gain by leaving it out? And what would we gain by specifying an algorithmic complexity for string.concat if we want to allow a range of implementation effort?

  1. UTF-8 interfaces lack expressivity relative to e.g. string.new_utf8 / string.new_wtf8/ string.new_lossy_utf8 and related functionality.

What do you mean by expressivity here? Is there a string constant that's harder to generate using JS string literals?

Say you have a WTF-16 string from Java and you want to pass it to Rust. What do you do for an isolated surrogate? You can replace it with U+FFFD, encode it as WTF-8, or throw an error. Different languages / use-cases might make different choices. Having these options is what I mean by expressivity.

From the meeting on Tuesday it sounds like there are two separate string problems:

1. Efficient access to host API's that use strings
2. Portable string implementation for toolchains to use for their source level string

There's a conflict in these goals though, because the most efficient thing for (1) is going to be different per-host leading to non-portability in (2).

stringref solves this by making the optimal thing for the Web (JS strings) portable as core wasm instructions.

I think it's actually not about JS -- it's about source languages! The reason there is a WTF-16 view in the stringref proposal is mainly because of Java et al. If some source languages didn't want to access strings in this way we could use a more pure iterator-based API that only treats strings as sequences of USVs. The fact that there is a nice zero-copy story with Java strings and the web platform is a bonus, but is secondary.


To return to your point about the cost model: honestly, I see it playing out like this: if WebAssembly-on-the-web ends up with a string facility that allows zero-copy strings for source languages that treat their strings as UTF-8, and this facility is useful and widely used, then the usual competitive pressures in browsers will over the next 3-5 years make it so all engines have either a UTF-8 representation for strings or a breadcrumbs solution. This will bring its own benefits for memory in general, given that the network is UTF-8, and other system components are moving in a UTF-8 direction.

OK, so web browsers have their 7 or 13 or whatever it is representations for strings, and also we add some for UTF-8 (and slices, ropes, etc). Non-web will start out by using toolchains to bundle ICU et al, but the best quality implementations eventually finish by doing the same as browsers by supporting both WTF-16 and WTF-8 as well as ropes and slices, though they might breadcrumb in the other direction, with WTF-8 as primary.

In this world, then, though the cost model is never specified, we converge on O(log n) or better code unit access for both WTF-16 and WTF-8, and potentially O(n) cost when first accessing the "foreign" encoding for the implementation. This to me is quite an OK end state. I could be overlooking many things though and your thoughts are most welcome here!

I feel that "stringref doesn't have a cost model" is a misrepresentation/misunderstanding of what I said in WebAssembly/stringref#56. All of the operations you asked about there (creating views, concat, eq, measure_*, is_usv_sequence) can easily be specified as "no worse than O(n)".
In practice, many of them will often be faster than that, but that's due to implementation-specific optimizations, and like all optimizations isn't formally specified/guaranteed. As a different example, consider a function that returns (local.get 0) + 1 + 1 + ... + 1 for n times + 1 (expressed as (i32.add (...) (i32.const 1)) of course). What's the cost model for this function? In a baseline compiler or wire bytes interpreter, it'll be O(n), which is of course the only reasonable statement that the Wasm spec can make about its cost. In an optimizing compiler or pre-processing interpreter with constant folding support, it'll be O(1), which is the majority case in production scenarios.

FWIW, we haven't specified cost models for other instructions in the past either. For example, array.new_data could be implemented as O(1) when the element type isn't mutable; but array.new_data doesn't have a formal cost model, and the obvious default assumption is that it'll be bounded by O(n), and anything faster than that is an implementation-specific optimization that may or may not happen.
Another interesting example is ref.cast for a subtype that's n levels deep. In this case, we have specifically designed it such that it can be implemented with O(1) cost, but that's not formally required, and the simplest conceivable implementation indeed has linear cost (and is perfectly legal by the spec).

stringref solves this by making the optimal thing for the Web (JS strings) portable as core wasm instructions.

I feel that that is a misrepresentation/misunderstanding of the stringref proposal. It could be so much simpler if it only cared to express JS strings, but (in my and many others' opinions) that would be a poor design strategy for a Wasm feature. Instead. it caters to the needs of source languages, and as a consequence gives equal consideration to UTF-8 strings and WTF-16 strings.

On the other hand if a non-web wasm implementation ends up providing a host string facility, we have three possibilities I think. One is that it is exactly like JS strings; that works for Java, but is exactly what we wanted to avoid by pushing strings out to builtins. Another is that it is more tailored to another string representation, e.g. Rust's; then it is not so useful for Java and related languages. Or, it could end up implementing something like stringref, which introduces some dynamism to allow source language compilers to express how they would like to access string contents.

I'm not clear on how the WTF8 view interface will be used, so I'm going to file an issue on the stringref proposal to discuss this more.

The main model is that if you obtain a view, you then have the fastest access to contents. I know it's not precise, but it's something.

Sure, that's reasonable. The issue is more around the cost of obtaining a view. In SpiderMonkey for the foreseeable future, acquiring a wtf8_view will always be an allocation and transcode, while acquiring a wtf16_view will be a flattening if accessing a rope or else a no-op. While the explainer seems to indicate that non-Web implementations would likely have the opposite.

This seems to be a pretty big incompatibility that would make it very challenging to write portable code with the expected performance behavior.

I think this is what I was getting at in my comment in the meeting -- let's imagine we leave string.concat out because we want to avoid cementing the JS string cost model. In fact let's leave out most all operations that actually access string contents. Instead we import these operations from the host, which is JS on the web. But engines will specialize appropriately, with the end situation that users expect the JS string de-facto cost model. So what would we gain by leaving it out?

Just to be clear, I would not support adding a stringref type in a world where access needs to be imported, I don't think that's any different than going the stringref route for the reasons you note.

However, I believe it is meaningfully different if there is an importable JS string type and operations on the Web. The reason is that we're not forcing non-Web engines to implement this, and it's expected that toolchains will need to bring their own implementation if they're targeting these platforms. And by bringing their own implementation, they can get predictable performance by emulating exactly how JS string works on the Web.

And what would we gain by specifying an algorithmic complexity for string.concat if we want to allow a range of implementation effort?

The key thing I am concerned about is the performance incompatibility that can arise when implementations use significantly different techniques. So far wasm instructions have been machine and data oriented enough that optimizing instructions generally doesn't change user code from O(n^2) to/from O(n). Adding a more abstract type that hides the data representation details like this, now opens up this issue.

In practice, many of them will often be faster than that, but that's due to implementation-specific optimizations, and like all optimizations isn't formally specified/guaranteed. As a different example, consider a function that returns (local.get 0) + 1 + 1 + ... + 1 for n times + 1 (expressed as (i32.add (...) (i32.const 1)) of course). What's the cost model for this function? In a baseline compiler or wire bytes interpreter, it'll be O(n), which is of course the only reasonable statement that the Wasm spec can make about its cost. In an optimizing compiler or pre-processing interpreter with constant folding support, it'll be O(1), which is the majority case in production scenarios.

I think there's a difference between constant folding and the situation with stringref. Constant folding can be done by source compilers (and is generally expected to). In addition, the difference in execution time grows as the number of constants in the function body grows, while with stringref, the difference in execution time grows with the actual length of values at runtime. And strings can be much larger than function bodies.

I think a more analogous situation is when the operands to instructions are dynamic. For integer addition when the operands are dynamic, the best implementations can generally do is emit a machine add instruction. For string concatenation however, it's possible to use lazy evaluation techniques which will significantly change the runtime performance of applications. That's a new bridge we're crossing with this proposal.

It's not so much about having a precise cost-model, but more about having instructions that are close enough to hardware and concrete data representations so that wasm engines don't need lazy evaluation, hash consing, etc, to get good enough performance.

FWIW, we haven't specified cost models for other instructions in the past either. For example, array.new_data could be implemented as O(1) when the element type isn't mutable; but array.new_data doesn't have a formal cost model, and the obvious default assumption is that it'll be bounded by O(n), and anything faster than that is an implementation-specific optimization that may or may not happen.

How could array.new_data be O(1)? Even if the elements are immutable, you still need to allocate a fresh object and do an initialization as GC objects have identity.

Another interesting example is ref.cast for a subtype that's n levels deep. In this case, we have specifically designed it such that it can be implemented with O(1) cost, but that's not formally required, and the simplest conceivable implementation indeed has linear cost (and is perfectly legal by the spec).

Yes, ref.cast was designed with a specific implementation technique in-mind even though it's not formally required. The performance gap however for implementations that do not do this technique is so significant that it is effectively required. And that's fine, because the complexity of O(1) casting is reasonable IMO. It is just some statically computable metadata, an index into an array, and a comparison. The techniques we do for JS Strings are an order of magnitude more complicated.

How could array.new_data be O(1)? Even if the elements are immutable, you still need to allocate a fresh object and do an initialization as GC objects have identity.

Implementations could have an array representation where the actual data is a pointer indirection away. (Like a JS TypedArray and its underlying ArrayBuffer.) So yes, you'd allocate a new (fixed-size) object with identity, and initialize it, and have its data pointer point straight at the module's data segment. That's O(1), allocations of arrays with length 1 or 100 would have the exact same cost.

How could array.new_data be O(1)? Even if the elements are immutable, you still need to allocate a fresh object and do an initialization as GC objects have identity.

Implementations could have an array representation where the actual data is a pointer indirection away. (Like a JS TypedArray and its underlying ArrayBuffer.) So yes, you'd allocate a new (fixed-size) object with identity, and initialize it, and have its data pointer point straight at the module's data segment. That's O(1), allocations of arrays with length 1 or 100 would have the exact same cost.

Ah, I see, interesting. I still believe that's at a different complexity level than the string optimizations we're talking about though.

Arrays would not switch to/from this special representation dynamically (as strings would). And while I would be concerned about the overhead of the extra indirection, at the least it doesn't involve case analysis on every operation.

I filed WebAssembly/stringref#62 for the UTF-8 question I mentioned.

My take on the set of builtins sketched out above:

Important things that are missing

(1) Conversion to and from array-of-i16. Per J2Wasm's experience, these are the two most important string-related operations, because they are very common, and there is no fast existing alternative. This is presumably not difficult to add.
There are (at least) two possibilities how to design the details:

  • "conversion semantics": signatures are [(ref null WA.String.type)] -> [(ref $array16)] and [(ref null $array16)] -> [(ref WA.String.type)], respectively (for a (type $array16 (array (mut i16)))).
  • "load/store semantics": views the array more like a linear memory, signatures are [(ref null WA.String.type), (ref null $array16), i32] -> [] (meaning: [source_string, destination_array, destination_start_index]; could optionally return the number of codeunits written) and [(ref null $array16), i32, i32] -> [(ref WA.String.type)] (meaning: [source_array, source_start_index, source_end_index] -> [result_string]; could optionally use "length" instead of "end_index" but I think the latter is the more modern convention).

FWIW, the stringref proposal argues that the second option is more generally useful because it's more configurable (allows you to e.g. concatenate multiple strings' contents into a single array). I think the differences between the two design variants matter much less than having something along these lines.

(2) An equivalent of a string.const instruction, i.e. some way to use string constants in global initializers. The Dart team says that this is very important for Dart/Flutter. This seems difficult to address (as in: I don't see a way), but maybe you have an idea?

(3) Type checks in all imported functions. This should be easy to add: any time one of these builtins takes a string parameter, it should do something like if (typeof string_param !== "string") { throw new WebAssembly.RuntimeError("invalid cast"); } (or, in Wasm terms, what (ref.as string) would do if there was a built-in string type).
Reason: I expect that there'll be a (possibly years-long) period where we already have the WebAssembly.String.* builtin functions, but haven't finalized a type-imports design yet, and during this time have to use externref in place of (ref null? WebAssembly.String.type), and to generate fast optimized code for the operations reflected by the builtins you really want to be able to just trap/throw when the incoming externref isn't a string. (I've prototyped and benchmarked this with and without type checks, and the performance difference was huge; I expect that when you find the time to prototype this, you'll observe the same.)
When we do get type imports, these extra checks will become unobservable (and trivially easy to optimize out even by baseline/single-pass compilers, just based on static type knowledge).

Nice-to-have things that are missing

(4) The ability to express possibly-expensive operations (such as flattening a string before indexed accesses, the equivalent of string.as_wtf16) as explicit operations. Not a huge deal in practice because sufficiently good optimizing compilers can handle it well either way, but as you know, Wasm spec folks often say it's desirable when potentially-costly steps are explicit.

(5) Support for UTF-8 based managed languages (that want to compile to WasmGC). That's a niche right now so in the near-term future it's not a huge problem, but it's worth pointing out that that's something this approach doesn't offer, and also doesn't have an obvious path towards offering in the future.

(6) A compare(strA, strB) operation that returns -1/0/1. (That doesn't subsume equals, because the latter can be faster since it statically knows that it only needs to check for equality. For example, an equals implementation can immediately return false/0 when the lengths of the inputs are unequal, whereas a compare implementation would still have to look at actual codeunits to determine whether it's a case of "aa" < "aaa" or "ab" > "aaa". Also, when iterating over codeunits, equals only needs one conditional branch for each codeunit whereas compare needs two.)

(7) If we want to entirely obviate the need for custom JS imports, we'd also need:

  • String.FromI32 (JS equivalent: (x|0).toString()).
    (Note 1: This is probably the most performance-relevant entry in this list.)
    (Note 2: I haven't seen a need for "string-to-i32" but maybe that's just me not having seen it.)
  • String.FromF64 (JS: x.toString()), String.ToF64 (JS: parseFloat(str))
  • toLowerCase(str) / toUpperCase(str) / toLocaleLowerCase(str, locale) / toLocaleUpperCase(str, locale)
  • indexOf / lastIndexOf
  • replace (probably both searching for regexp and searching for string, I'm not sure)
  • For JVM languages it's also convenient to have equalsIgnoreCase and compareIgnoreCase, but those are admittedly very JVM-specific, and so far I haven't seen data to indicate them being hot in many real-world applications.

I'm not saying these all have to exist, we can also live with modules importing a mix of WebAssembly.String.* builtins and additional custom imports to fill in the gaps.

Things that are superfluous

(8) I think fromCharCode is wholly subsumed by fromCodePoint. (That said, there's no harm in having both.)

Higher-level thoughts

This approach deepens the rift between Wasm modules targeting browsers and those targeting non-browsers, and I for one don't think that's a good thing. While there are certain differences in use cases and available technologies anyway (e.g., you're unlikely to run your Canvas-drawing UI on a server, or your filesystem-using data storage backend in a browser, although there are counter-examples for both of these statements), I think the whole story of Node.js over the past decade-and-a-half has shown us that people really like to share code between front-end and back-end of their application deployments. In particular, libraries dealing with strings (directly, or coincidentally as inputs/outputs of arbitrary functionality) seem like prime candidates for write-once-run-anywhere, and having a browser-specific approach to strings (with yet-to-be-found polyfills/alternatives for non-browser environments) is likely going to add friction to that.

Thanks for the feedback on these builtins, that's very helpful. I'm going to focus on the parts that seem the most important for now.

(2) An equivalent of a string.const instruction, i.e. some way to use string constants in global initializers. The Dart team says that this is very important for Dart/Flutter. This seems difficult to address (as in: I don't see a way), but maybe you have an idea?

My first thoughts for how this could be done would be to import these string constants as globals.

For example in the module you'd have:

(module
  (import "strings" "constantName" (global (ref extern)) (; or using type imports ;)
)

And when instantiating it you'd have:

let imports = {
  "strings": {
    "constantName": "constantValue",
  }
};
WebAssembly.instantiateStreaming(module, imports);

I'd expect that to be size efficient, and also use the optimized string constant machinery already in JS runtimes. If you need a string as a subexpression in a global initializer, you would need to break it into multiple globals.

(5) Support for UTF-8 based managed languages (that want to compile to WasmGC). That's a niche right now so in the near-term future it's not a huge problem, but it's worth pointing out that that's something this approach doesn't offer, and also doesn't have an obvious path towards offering in the future.

I'm a bit doubtful that stringref without native engine support for a UTF-8 representation in JS string is actually useful for UTF-8 languages to target. Discussing this here.

And if we're adding native engine support for UTF-8 backing of JS strings, then it seems like it would be useful for this to be exposed to JS as well. We could add .lengthUTF8, and codePointAtUTF8 to JS String which take a UTF-8 interpretation to length and indexing. At that point it would be natural to extend the wasm builtins to support those as well.

So I think the UTF-8 issue can be orthogonal to whether we use builtins or a core wasm type here.

Higher-level thoughts

This approach deepens the rift between Wasm modules targeting browsers and those targeting non-browsers, and I for one don't think that's a good thing. While there are certain differences in use cases and available technologies anyway (e.g., you're unlikely to run your Canvas-drawing UI on a server, or your filesystem-using data storage backend in a browser, although there are counter-examples for both of these statements), I think the whole story of Node.js over the past decade-and-a-half has shown us that people really like to share code between front-end and back-end of their application deployments. In particular, libraries dealing with strings (directly, or coincidentally as inputs/outputs of arbitrary functionality) seem like prime candidates for write-once-run-anywhere, and having a browser-specific approach to strings (with yet-to-be-found polyfills/alternatives for non-browser environments) is likely going to add friction to that.

I guess I view this less as 'deepening the rift' and more of 'not exporting the JS environment to non-JS environments'.

I agree that write-once-run-anywhere would be really nice to have. But I don't see how that's feasible for WebAssembly modules to target both WASI and the browser in the same binary. You need to pick which host environment you're compiling your code for and have separate binaries.

And once you have separate binaries, the main issue is toolchain complexity for generating separate binaries. I'm sympathetic to this, but I think there are several mitigating factors:

  1. This will not affect all languages porting to WebAssembly. Some languages will need/want to have their own String implementation
  2. Languages can port to the Web without using the JS String builtins and copy on the boundary
  3. Languages could only support the JS String builtins and not have WASI or other wasm embedding support
  4. Languages could only support the JS String builtins and use a JS String polyfill

I guess I view this less as 'deepening the rift' and more of 'not exporting the JS environment to non-JS environments'.

That seems to be the key difference in our perspectives.
I see stringref as "designing a general-purpose (in particular: abstracting over encoding choices) string feature for Wasm's needs", which is decidedly not the same as just taking JS strings (as e.g. the UTF-8 topic clearly illustrates).
Importing JS strings as builtins, OTOH, means telling managed languages "if you want fast strings in Wasm, you should import JS strings", which means when someone wants to run such a module outside a browser, they have to figure out a way to export the JS environment to this non-JS environment. So it seems to me that making Wasm* rely on imported JS strings creates the situation that it claims/intends to avoid.

*) specifically: WasmGC-enabled modules compiled from managed languages.

It seems like there is a direct tension between the goals of:

  1. providing wasm running in browsers efficient access to all the JS and Web API objects that wasm might need efficient access to over time
  2. allowing wasm to run with portable and predictable performance across the wide variety of hosts that we're seeing wasm pop up in outside the browser

Goal 1 wants to expose as much of JS and the Web as possible and, from what I can see, in many cases the applications that want this efficient access don't care about running outside the browser b/c they are inherently web apps. Thus, if you hide an efficient operation in pursuit of Goal 2, it will just frustrate these applications for little benefit to them. Also, Goal 1 doesn't just want strings; over time, Goal 1 wants typed arrays, JS arrays and objects (the inline fast/common paths), etc. The approach outlined by Ryan here seems very aligned with Goal 1.

Second, host-string-interop in browsers is, iiuc, an urgent problem to solve whereas I don't see the same pain points outside the browser now or in the immediate future. Thus, in the short term, it seems like we can just focus on Goal 1 and let the outside-the-browser ecosystem develop to the point where there is a concrete problem to solve, by which time we'll have a more concrete context to evaluate a solution that respects Goal 2.

For the set of adventurous languages that are currently or will soon be trying to generate wasm running inside and outside the browser that want to share strings with the browser, I'd suggest having their compiler emit core wasm that calls the imported JS string built-ins and then "fixing up" this core module to run outside the browser as a post-processing step. One way to fixup the core module would be to implement the JS string built-ins in another core module using wasm GC and then use Binaryen to inline them at build-time. While that theoretically leaves performance on the table, today, it's all linear memory all the time via WASI Preview 1 and vendor-specific APIs. With WASI Preview 2 and the Component Model, new opportunities open up due to the abstraction over low-level memory representation, but this is a whole separate story that will take time to develop and discuss and may not require stringref at all, all of which suggests that we be conservative in the short-term and not assume outside-the-browser needs stringref.

wingo commented

I think that this tension may exist but it does not apply to strings. With my compiling-Scheme-to-Wasm-GC hat on, I don't want JS strings -- I want strings from the host, whatever host. My current targets are the browser and headless python+wasmtime (once wasmtime does gc). JavaScript is one host. I would be just as happy with a utf-8 implementation of strings on wasmtime. But I want cheap intracomponent communication with host-provided string libraries (regex, icu, etc). I have no interest in typed arrays or JS objects; this is a slippery-slope argument that does not apply.

I will not be generating calls to JS builtins because my language does not treat strings as UTF-16 sequences. Because I target non-web also, I will not use a pre-imported type to implement strings. Builtins help me on the boundary when invoked on the JS side for faster array/string conversion; nothing else.

Saying "host-string interop in browsers is urgent" misses the point. If it's urgent for a source language on the browser it is also urgent on non-browser, unless non-browser doesn't want to support programs written in the given source language.

I have no interest in typed arrays or JS objects; this is a slippery-slope argument that does not apply.

Perhaps for the Scheme use case, but for use cases trying to use e.g. WebGPU (or any other JS/Web API that requires transferring large amounts of data) from WasmGC programs, being able to get data into and out of typed arrays quickly is extremely important and not a hypothetical slippery slope at all. I agree with @lukewagner that the approach here extends nicely to cover that use case and many others that will come up.

wingo commented

To be clear I think builtins are a good thing! I just mean that they don't do much for me as regards string representation.

I guess I view this less as 'deepening the rift' and more of 'not exporting the JS environment to non-JS environments'.

That seems to be the key difference in our perspectives. I see stringref as "designing a general-purpose (in particular: abstracting over encoding choices) string feature for Wasm's needs", which is decidedly not the same as just taking JS strings (as e.g. the UTF-8 topic clearly illustrates). Importing JS strings as builtins, OTOH, means telling managed languages "if you want fast strings in Wasm, you should import JS strings", which means when someone wants to run such a module outside a browser, they have to figure out a way to export the JS environment to this non-JS environment. So it seems to me that making Wasm* rely on imported JS strings creates the situation that it claims/intends to avoid.

*) specifically: WasmGC-enabled modules compiled from managed languages.

I don't think it's possible to design a general-purpose string feature for wasm. There is such a diversity of string types in languages that anything we design will either be not useful to many languages or a poor version of any particular language's string type.

You can abstract away the data representation, but then that makes it so that any performance critical operation must live in the host which can access the data representation in the native format. Otherwise your code may end up on a host with a different native encoding format (e.g. UTF-8 on the web) and run into copies when accessing your strings. There's a huge diversity of common string operations, and so I think this will inevitably lead to a large library of string operations added to wasm (or some cross-host import library).

I think the approach of Wasm-GC of adding types that expose the data representation so that source language compilers can specify exactly what they need is better and more scalable. If there was no consideration of hosts, then strings built with Wasm-GC would give languages to build exactly the string representation they want, which is a better world than with stringref.

And when it comes to hosts, I think this is primarily a problem on the web where host API's use JS strings. For the diverse set of non-Web platforms they can design their API's to efficiently use GC and linear memory types.

I will not be generating calls to JS builtins because my language does not treat strings as UTF-16 sequences. Because I target non-web also, I will not use a pre-imported type to implement strings. Builtins help me on the boundary when invoked on the JS side for faster array/string conversion; nothing else.

You should be able to get the same performance using JS string builtins as you would with stringref in browsers. Yes, JS strings are UTF-16 sequences, but so is stringref in browsers. You can get a stringview_utf8 with stringref to pay the cost of conversion, but you could also do the conversion step with JS strings to a GC array as well for the same cost.

I am personally ok with either stringref or builtins (your proposal looks good to me, @eqrion), but to add to what @wingo said, I think there are two main ways of porting languages:

  1. Port an existing VM, like recompiling the Dart VM to x86 or ARM or even wasm. Maybe implement a JIT backend for the new architecture, but otherwise just recompile.
  2. Deep integration with a host VM, like ports to JavaScript, the JVM, and the CLR tend to do: compile a language's objects down to objects on the host, e.g. dart2js compiles Dart objects to JS objects, etc. This option may cause language semantics to change slightly (e.g. dart2js numbers are JS Numbers; Python on .NET has different strings than CPython; GC finalizers can differ, etc.). That has issues in some cases, but it is actually what you want in others, when host integration is your goal.

I understand @wingo to be asking for wasm to support that host-integration option for strings: that is, to let Scheme be compiled to wasm using WasmGC + stringref, similar to how you would port Scheme to the JVM or CLR. Is that right @wingo?

If so then I think it's a reasonable request because it's common practice on other popular VMs and it has precedent in wasm in the form of WasmGC which provides deep integration with the host GC.

The big question is whether the wasm community wants to support such host integration in general, or not. I can see reasonable arguments both ways. If we draw some line in the middle - say, yes to GC but not to strings - then we should explain why, as that doesn't seem obvious.

wingo commented

I will not be generating calls to JS builtins because my language does not treat strings as UTF-16 sequences. Because I target non-web also, I will not use a pre-imported type to implement strings. Builtins help me on the boundary when invoked on the JS side for faster array/string conversion; nothing else.

You should be able to get the same performance using JS string builtins as you would with stringref in browsers. Yes, JS strings are UTF-16 sequences, but so is stringref in browsers. You can get a stringview_utf8 with stringref to pay the cost of conversion, but you could also do the conversion step with JS strings to a GC array as well for the same cost.

The argument isn't that perf would be bad if I used builtins. Indeed with stringref I could get similar performance even on non-browser (because I don't depend strongly on UTF-8 vs UTF-16 encodings). With builtins I have no non-browser story, and the builtins solution requires much more browser-specific compilation work. If I have to do all that, I might as well use utf-8 arrays because unlike builtins, that doesn't close the door on non-browser.

wingo commented

Apologies for my grumpiness, by the way. I realize we're all here to solve problems. I am just trying to make sure my use case is heard and I appreciate the engagement.

And yes, I think it is as @kripken notes -- that I am trying to thread the needle in a way, to have some host integration for strings, ideally in a semantics-preserving way. We all have different tradeoffs -- I wouldn't have made the dart2js numbers-are-JS-numbers choice, but I would have done the Python .NET string one. For me host strings offer more benefits to the system than downsides, but reasonable people may disagree, and indeed I would expect some source languages to prefer to have their own strings.

+1 to everything @eqrion says above.

@kripken:

similar to how you would port Scheme to the JVM or CLR.

But that's the crux: Wasm isn't like the JVM or the CLR, and very intentionally so. Those have rich object systems and highly opinionated sets of types, that basically are an alternative syntax for Java or C#. They are also great targets for compiling another language to -- but only if that language happens to be a variation of some Java or C# subset (or can be coerced into one). If not, then you typically end up in a world of impedance mismatches and inefficiencies all over the place.

Wasm consciously took the opposite approach, being as unoppinionated and low-level as possible. (And yes, that applies even to GC -- we did not introduce objects, inheritance, vtables, bignums, or anything like that, even though that would surely make compiling some languages easier. You have to build them yourself. Same goes for strings. But then you can build them like you actually need them.)

@rossberg

I definitely agree as to that difference between wasm and the JVM or CLR. Yes, WasmGC is as low-level as possible given the constraints, as you said.

But @wingo's points have made me see another side to this. Yes, we want to only add low-level things to Wasm, but maybe stringref is the lowest-level, actually.

Specifically, we know that in a JS environment (Web, node, etc.) there is obvious value to to accessing the platform's strings. But wasm doesn't just run on JS VMs, and there are platform strings elsewhere, like wasm on the JVM or CLR (which exist today) and elsewhere (as @wingo said).

We can just let each of those platforms solve string integration itself. A Wasm-JS API for strings solves that for JS. But I think there is potentially great value in providing a standard way for all platforms to do so. That would avoid duplicated effort and fragmentation. And a minimal stringref may be the lowest-level way to do that.

To be clear, I'm a web person, so I would be happy with builtins + a JS string API! But when I think of the entire ecosystem I wonder if a more general solution may be better.

But I think there is potentially great value in providing a standard way for all platforms to do so.

Given all the many conflicting ways in which both host platforms and languages might represent strings, I don't think that is achievable in a useful manner, for all the reasons @eqrion mentioned -- in the same way that we cannot achieve a standard way for accessing platform objects or any other form of non-machine value by means of Wasm instructions. Strings are not special in that regard. Just trying to think about unifying the needs of a JS host and a C host (via the C API) already gives me a panic attack, and their strings are still relatively close. In the end, anything portable on that level will require a separate mechanisms like interface types.

@rossberg

I agree that for truly portable strings across all languages we need something else (like Interface Types with copying). And that is an important use case, you are right.

But the use case here is separate. Here we want to reuse the host string type. Again, this is similar to how one ports a language to JS or the JVM or CLR: one adopts the host string semantics. In that case we want efficient use of host strings, whatever they may be. This approach does not even try to unify the needs of two different languages, so it is fine that such unification is impossible in general as you said.

I'm not sure of the importance of this case. But I think it's worth thinking about, at least, since it's come up more than once, and it is useful on other VMs.

@kripken, when I use something like the C API to embed Wasm, then C strings are my host string type. So even a host-oriented mechanism would have to be language-portable in that way.

titzer commented

I think it's more important for Wasm to have the right mechanisms to build platforms and languages than to pull platform features into the core language. I generally agree with @eqrion and @rossberg 's points above. I think stringref is a platform feature.

We should integrate over O(decades) worth of platforms, both existing and new, adopting Wasm. Each will come with their version of the stringref problem: some opaque types, values, and operations on those types and values. I see stringref as a very important and common use case, but not fundamentally different than future use cases (e.g. handles to database rows, CHERI capabilities, or some other exotic platform or hardware feature).

@rossberg

You are right that stringref might not make sense for all cases. A runtime using C strings, as you said, might not want to implement stringref for those. As I see it, stringref is not something every wasm VM will want to implement, just like some VMs might not implement threads etc.

The set of use cases I (and I think @wingo) have in mind are runtimes with reference-counted or GC strings, such as: wasm on the Web, the JVM, the CLR, a runtime written in C++ or Rust that does have managed strings (say, a game engine), etc.

Overall, do you agree that this is a valid use case, even if you think it matters less? I'm not sure from your last comments if you are pointing out limitations in the use case (which are valid!), or if you are still trying to understand the use case, or if you don't see value in it.

@kripken, I agree that it has a use case, but that is not a sufficient condition – just about anything can be motivated with use cases.

The problem is that the use case is limited and moreover, it is not clear how the feature should operate outside the environments where this use case applies – as a core language and portability feature, it would need to have a suitable semantics everywhere. And if we can't achieve universal portability, then the argument for putting it in the core language is moot. As @titzer pointed out, an interop feature that makes fairly concrete assumptions about the host side is better suited as part of the host-specific interop layer(s).

First off: since we're talking about GC'ed strings, standardizing them (by pursuing the stringref proposal, or another design) is a GC post-MVP feature. Modules compiled from C++/Rust/... generally won't benefit from it, just like they don't benefit from GC. If you're currently focused on scenarios that don't need WasmGC, then it makes total sense that you don't see a need for standardized GC'ed strings either.

Now, if you do care about bringing applications built on WasmGC to production:

The ability to re-use host strings in cases where they already exist is really only part of the picture, and certainly not enough motivation on its own. In particular, the ability to re-use implementations is not a reason for anything at all (implementing things anew is always justified if that's better for Wasm); what matters is the ability to have low-overhead interop with embedder/environment infrastructure.

While the specific scenario of reusing existing host strings largely only applies to browsers, there is a variant of it for Wasm hosts that don't have strings yet: suppose a Wasm deployment environment wants to offer a regexp engine, or a library of locale-aware operations, or other directly or indirectly string-related infrastructure for Wasm modules to use. It doesn't even matter whether the idea is to offer these things as engine-internal builtins, as native third-party libraries (à la Python or Node.js native modules), or as third-party Wasm modules. When every toolchain cooks up its own homegrown string representation, then there's no generally efficient way to define the interfaces to such infrastructure. If Wasm had a standard string type, that would be very useful in particular for such non-standardized functionality building on top of it.
(I don't see this as a new concept. The existing Wasm world crucially depends on everyone using the same i32 etc, and of course these standard types are used at the boundary between unrelated modules, or between modules and hosts, all the time. By adding a standard string type, we'd be extending the set of standard types that are available for commonplace interface definitions from "just numbers" to "numbers and strings", which is a very useful increase in expressiveness, considering how common strings and string-related operations are.)

The other important aspect of built-in strings is completely unrelated to existing host strings, and that is the speed of "macro" operations (processing an entire string at once). As an illustrative example: performing computations in terms of i32 is great, but when you want to show the result of the computation to the user, you need an integer-to-string conversion. Compiling the JVM's highly optimized (caching and shortcutting and lookup-table-using and operating-on-two-digits-at-a-time-when-possible) implementation to userspace Wasm not only creates bigger modules but is also about 3x slower than using V8's internal implementation of the same operation. This is why I've previously compared string bulk operations to memory bulk operations. Maybe in a perfect world, native memcpy wouldn't be faster than a user-space byte-wise copying loop, but in practice it is much faster, and there is no indication that that will change in the future. The same is true for string operations. And it's worth noting that coming up with a way to inline imported functions ("pre-imports" or otherwise), while potentially useful for other things, doesn't solve this.

It turned out (somewhat surprisingly to everyone involved) that these two aspects of fast strings (fast environment interop, and fast bulk operations) are crucially important for applications compiling to WasmGC, up to the point of being launch-blocking if one of their requirements is to be faster than their legacy version that's compiled to JavaScript.

So the question we're facing is: do we add GC'ed strings to core Wasm (where we can design them to serve all GC'ed languages, and where they will be available on all feature-complete engines), or do we settle for some browser-only solution that's narrowly targeted at Java and very similar languages?
While the latter would be good enough to satisfy our immediate needs, I think the former approach is vastly superior in the long run, and much more in line with previous design decisions. For example, we didn't make WasmGC a browser-only thing, despite browser engines being the only implementations that could re-use existing garbage collectors for it; and we took great care to ensure that it serves more languages than Java, despite Java being one of its most prominent early adopters.
I'm very much surprised that so many folks on this thread are arguing so strongly that for strings we should do the browser-only Java-specific thing instead of designing a more general solution.

Maybe what some of you are feeling compelled to argue against is the idea of extending generality in the "not just browsers" sense without also extending generality in the "not just Java" sense. I agree that such an outcome would indeed be undesirable: we shouldn't put Java-specific primitives into core Wasm. And that's not what the stringref proposal is doing: since its inception, it was designed to give no preferential treatment to Java-style strings. If you think there are cases where it's still falling short of that goal, then how about we iterate on it, rather than giving up on the entire idea of standardizing generally-useful GC'ed strings?

I see stringref as [...] not fundamentally different than [...] some other exotic platform or hardware feature

I would characterize the fact that strings are not at all "exotic" as a fairly fundamental difference.

Given all the many conflicting ways in which both host platforms and languages might represent strings, I don't think that is achievable in a useful manner

All the many conflicting ways in which both host platforms and languages might represent garbage-collected objects didn't stop us from achieving a generally-useful design for WasmGC.

it would need to have a suitable semantics everywhere

Is there a concrete place where the stringref proposal currently does not have a "suitable semantics"?

Quick reply, since I have to run:

I would characterize the fact that strings are not at all "exotic" as a fairly fundamental difference.

I beg to differ. In comparison to how most other languages represent/implement them, strings in JS engines with their built-in roping, slicing, flattening, multiple representations, etc. are rather exotic.

All the many conflicting ways in which both host platforms and languages might represent garbage-collected objects didn't stop us from achieving a generally-useful design for WasmGC.

We made no attempt to define Wasm GC objects to be interchangeable with host objects, e.g., you are not able to view existing JS objects as Wasm objects. The apples-to-apples equivalent for string types, if they were primitive in Wasm, would hence be a Wasm string type that is different from JS strings, and that does not allow arbitrary JS strings.

Is there a concrete place where the stringref proposal currently does not have a "suitable semantics"?

If the goal is efficient portable host interop, how would stringref be mapped onto C strings or other non-JS host string types? And how would that be done without wildly varying performance characteristics? For example, C strings have O(n) length and O(n) concatenation.

strings in JS engines with their built-in roping, slicing, flattening, multiple representations, etc. are rather exotic.

The fact that strings can be concatenated and that substrings can be created is not exotic at all; those are two of the most basic string operations. Everything else is an implementation detail, and neither standardized nor suggested to be standardized.

We made no attempt to define Wasm GC objects to be interchangeable with host objects, e.g., you are not able to view existing JS objects as Wasm objects.

True, because for JS objects that would have been somewhere between infeasible and unreasonable. The strings of the world have much more in common with each other than the objects of the world, so making them interchangeable as a secondary design goal is very much feasible. That said, the stringref proposal does not require this, it only allows it, and I don't see how that's a drawback.
For the sake of argument, assume that nobody had had the idea yet that it would be beneficial if Wasm strings were interchangeable with host strings (in browsers or elsewhere), and the task was only to define a Wasm string primitive that's a useful building block for managed source languages (to tap into the second benefit I mentioned above, fast native bulk operations). I posit that not having the design goal of interchangeability would not affect the outcome of this design process; rather, at the end of it you'd realize that interchangeability naturally falls out as an accidental additional benefit. It does not create hurdles or constraints.

how would stringref be mapped onto C strings

Please read the first paragraph of my previous post again. (Phrased differently: same as how structrefs are mapped onto C structs: they aren't.)

C strings have O(n) length and O(n) concatenation.

And we could specify the same for Wasm strings (if we want to start formally specifying the complexity of instructions). O(n) concatenation is a very reasonable upper bound, and I for one don't think we should require implementations to apply any tricks to pretend to be faster than that. O(n) for length is somewhat more debatable; in particular I'm under the impression that GC'ed string implementations (for any language) tend to strongly want O(1) length internally anyway to make their GC faster, so I wouldn't be surprised if requiring O(1) length for (managed! Again, not C!) Wasm strings would actually be fine in practice.

Since we're on the topic of concatenation: While people here are worrying about possible undesirable effects of allowing concatenation to potentially be O(1), elsewhere:

  • JavaScript spec folks are contemplating adding a "string builder" feature to their language, because it turns out that "magic O(1)" concatenation is often not what you want, and in fact can be slower than traditional copying-based O(n) concatenation implementations; it depends on the use case.
  • JavaScript engine implementers are experimenting with under-the-hood tricks to automatically employ string builder like mechanisms in certain cases that their optimizing compilers can detect.
  • The existing stringref-targeting J2Wasm implementation doesn't use the string.concat instruction at all, instead it rewrites string additions to string builder usages. While I wouldn't be surprised if this implementation choice was re-evaluated at some point in the future, it does illustrate that having string concatenation that might be O(1) in Wasm does not hopelessly outclass any other implementation technique, and hence does not "force" all implementations to hop onto that train.

The fact that strings can be concatenated and that substrings can be created is not exotic at all;

But it is that there are O(1) primitives for it. And we cannot let the cost of such operations vary between O(1) and O(n) across implementations, because that would turn very common loops from O(n) to O(n^2). So there at least needs to be a de facto standard for the cost of such operations – if that is O(1), then we have exotic strings, if it is O(n), then they cannot be JS strings.

True, because for JS objects that would have been somewhere between infeasible and unreasonable.

Yes, and what some of us are saying is that the same is true for strings and other non-machine type, once you widen the perspective.

same as how structrefs are mapped onto C structs: they aren't.

But then they could just as well use array(i8). The "portable host interop" argument becomes moot then, as it only works for JS. In which case it is more well-suited for the JS interop API, which is what this issue proposes.

PS: Really, the premise of free host interop is highly questionable beyond Javascript. It only works if both host language and Wasm are GC'ed, and they share the same heap. That essentially requires both to be implemented as one engine. For historical reasons, that happens to be the case for Wasm-in-JS implementations, but that is a super special case, and it is doubtful that anybody is interested in doing that for any other language VM.

But it is that there are O(1) primitives for it.

See my previous post for why this concern is way overrated.

If the spec specifies the complexity of these operations at all, it should specify it as "O(n)". As I'm sure you're fully aware, "O(1) concat" actually just shifts the cost around: if a subsequent operation flattens the rope, then the overall complexity is Ω(n). Yes, naive loops can degenerate, but such loops are also easy to avoid by toolchains (I've quoted an existing example above).

We can also totally specify+implement concat/slice as Ω(n) in Wasm if that assuages concerns. The aim of the stringref proposal is not to specify or import any particular implementation details or performance characteristics; the aim is to provide a useful (in several ways) and general managed string primitive.

Also, having a standard string type would still provide most of the benefits I described earlier if we dropped concat and slice operations from the proposal entirely. So if these two are the big concerns, that would be one possible way forward.

we cannot let the cost of such operations vary between O(1) and O(n) across implementations

But that's exactly what will happen when modules running in browsers import browser strings, while modules running in other environments satisfy their string imports with simpler implementations (provided by those hosts or third-party Wasm modules).

The "portable host interop" argument

As I said above, that's one half of one of two important arguments. You're still entirely ignoring the second half, and the second argument.

they share the same heap

Sharing the heap between host and Wasm was the whole point of the GC proposal. And I do believe that's a worthwhile goal; having several independent heaps in the same application leads to all sorts of sadness, as we know from experience. So I think it's fair to expect that GC-enabled Wasm hosts will share the same heap (as long as the host has a heap at all; and even if it doesn't, since it's the one providing the heap, it knows how to deal with it).


Look, @rossberg, I get that you've made up your mind against strings, and won't be convinced otherwise. Fine.

My main point still stands: I'm surprised that so many people here, including you, are arguing so strongly that a browser-specific solution for something as common as GC'ed strings is somehow better than making Wasm the same platform everywhere. I thought providing a common compilation target platform was the point.

Luckily, from a selfishly narrow-minded browser perspective, we can live with a browser-only solution, so if that's the only way forward that this community can agree on, then so be it.

If the spec specifies the complexity of these operations at all, it should specify it as "O(n)". As I'm sure you're fully aware, "O(1) concat" actually just shifts the cost around: if a subsequent operation flattens the rope, then the overall complexity is Ω(n). Yes, naive loops can degenerate, but such loops are also easy to avoid by toolchains (I've quoted an existing example above).

This argument works if the toolchain controls the whole algorithm so that it can move the costs around as you suggest; if that were the extent of the problem, then I would agree with you that the problem was overrated. However, what will happen very naturally (and without anyone doing anything obviously "wrong") is that a toolchain will compile its source-language concatenation/slicing/substringing operators and library functions to the analogous stringref ops and then it will be the myriads of source-language developers who are, understandably, not aware of the subtleties at play that will unknowingly write code that critically depends on a certain loop executing in O(n). Having personally accidentally made changes to the SM JS string representation that broke such implicit assumptions, and then immediately had it reported as "this site is broken on Firefox", this is not speculative; it's a matter of "when" not "if".

Thanks, @lukewagner . Three questions and a clarification in response:

  • If we dropped the concat and slice operations, would that address your concern?
  • If we required engines to implement concat and slice no faster than Ω(n), would that address your concern?
  • In what way does importing browser-typical O(1) concat and slice operations address your concern? Especially if non-browser Wasm runtimes are betting on satisfying string-related imports some other way (e.g. with third-party modules), doesn't that run into all of the same risks?

This argument works if the toolchain controls the whole algorithm so that it can move the costs around as you suggest

J2Wasm is not a smart optimizing compiler and has no control over "the whole algorithm", it essentially just walks over AST nodes and applies local transformations. It replaces all string concatenation operators in the Java source with StringBuilder usages, i.e. copying characters into a buffer array, and creating a new string from that buffer array at the end. Chains of + use the same StringBuilder (I think), singular + creates a StringBuilder for just those two string parts. If we dropped built-in concat, we'd basically force all toolchains to always do it that way. (And similar for slice.)

@rossberg

Really, the premise of free host interop is highly questionable beyond Javascript. It only works if both host language and Wasm are GC'ed, and they share the same heap. That essentially requires both to be implemented as one engine. For historical reasons, that happens to be the case for Wasm-in-JS implementations, but that is a super special case, and it is doubtful that anybody is interested in doing that for any other language VM.

I do not think we should assume JS is unique in being the only environment that wants such integration. In fact, maybe that assumption underlies some of the disagreement here?

First, JS isn't unique because we already have people running wasm lowered to other runtimes. For example, asmble compiles wasm to JVM bytecode, and dotnet-webassembly does the same for the CLR. AFAIK neither of those support Wasm GC yet, but the natural implementation would be to use JVM/.NET objects.

Second, popular wasm runtimes like wasmtime and wasmer have bindings for the JVM and CLR. Those would presumably not reuse JVM or CLR objects for WasmGC, of course, but they show the motivation for running wasm on those VMs. And, once you do that, there can be use cases where you do want a single VM, simply for performance - exactly for the same reasons as JS does today for both GC and strings.

Overall, it is true that only JS needs this urgently now, but we shouldn't infer that nothing else will. I draw the opposite conclusion: that JS is an early adopter.

With all that said, maybe the best solution is still for each VM to do its own thing outside of the core wasm spec. That's a reasonable point of view as well, given the points raised by @titzer, @rossberg, and @lukewagner. But it does have a potential cost as well in duplication and fragmentation, and I share @jakobkummerow 's concerns there.

Even if we were to specify a string type and operations at a higher level (either in the JS API or some new "VM library specification" document), I hope we could still consider questions like "how would a UTF-8-first runtime implement this?" and "how could this be polyfilled?". We shouldn't fall into the trap of thinking that we have to choose between a platform agnostic core language stringref and a single-focus web-only pre-import stringref. There's space to consider the design of a pre-import stringref that is still somewhat platform-agnostic.

@jakobkummerow Good questions!

If we dropped the concat and slice operations, would that address your concern?

Almost yes: there's also an O(1)-vs-O(n) hazard hiding in random-access indexing and equality as well. With those gone, what would remain is a pretty spartan set of operations: definitely-O(n)-because-mutability copy-to-or-from-linear-or-GC-memory instructions and O(1) code-point iteration instructions (with, I assume, some opaque cursor type to hide the byte offset). While this would do a good job allowing the host to use its native string representation, thereby avoiding host-to-wasm copies in many situations, this would probably make a poor basis for a language's main string representation, so I would assume that most (although not all) languages would use linear memory or a GC array as the main representation and use this spartan-string as a sort of "delayed host copy" (similar to how JS engines think of ropes as "delayed concatenations"). Thus, I think these spartan-strings could be valuable in core wasm, but only for the narrow problem of avoiding/delaying host copies, which seems to break some of the use cases folks are describing above of using stringref as most languages' main string representation. And again, if I'm just building a webapp, I think I'd still want the built-ins (string and others) because I want the whole buffalo.

If we required engines to implement concat and slice no faster than Ω(n), would that address your concern?

It's hard to see this not ending in a fig leaf situation (in which engines did it anyways after a polite waiting period), so I don't see this ending differently, so no, it wouldn't.

In what way does importing browser-typical O(1) concat and slice operations address your concern? Especially if non-browser Wasm runtimes are betting on satisfying string-related imports some other way (e.g. with third-party modules), doesn't that run into all of the same risks?

Given that host-string-integration is purely a perf optimization and the easiest thing to implement by far is to just copy at the boundary (x10 if you're a linear memory language), I think that is what toolchains will continue to do by default for most languages for some time. Yes, that leaves some performance on the table (although in less cases than I think folks are imagining, when you think end-to-end and weight by language prevalence), and so I think it would be a win to add something like the spartan-stringref I described above. But I don't think there's urgency to add this beyond the browser-focused use cases your team is looking at; outside the browser we're still just trying to get things working portably at all. And for those few languages that are on the bleeding edge and wanting to invest the extra effort to do the optimal thing for the browser but also want to codegen core wasm that runs outside the browser, the approach I outlined above I think works fine as a stopgap.

@lukewagner : I agree that "spartan strings" would be too limited to fulfil the string needs of source languages, and that on their own they would help with boundary-related tasks. To make them more full-fledged, they could be enhanced with imported functionality (which would be host-specific, i.e. most likely just specified in the JS API, at least for now), where they'd serve as the type, which would be useful for the imports scenario, especially as long as type imports aren't a thing yet. How would you feel about such a hybrid design? As a rough sketch, that would probably mean adding the following to core Wasm:

  • a managed string type, presumably with string <: any and none <: string
  • instructions to create strings from linear memory and WasmGC arrays, and writing them back into these places, using a small set of common encodings (presumably at the very least lossy-utf8 and wtf16, I think wobbly-utf8 would also be useful, I'm less sure about strict-utf8 because it provides no performance benefits and getting a trap when handling user-controlled data is more annoying than helpful)
  • a string.const instruction, and corresponding section type that stores these constants
  • possibly code point iteration, perhaps in the form of the stringview_iter facility in the stringref proposal. (I write "possibly" because I don't mind having it, but I'm also not sure how important/useful it is.)

and leaving everything else up to imports. A possible (but not automatic) future path would be to revisit a set of string operations that may be useful to standardize in core Wasm later on, once we have more bandwidth to think about it, and more clarity on the various needs of various managed languages in various host environments (browsers and others). WDYT? I understand that you don't see urgency for this, but would it be an acceptable compromise to lay some super generic foundations now and leave all debatable details for later?


Speaking of types, @eqrion and I have started fleshing out the design in the thread-starting post in this doc:

bit.ly/wasm-imported-strings

with an eye towards prototyping that to find out if it actually works well enough. We'd be happy to get additional feedback on the design specifics.

One open question is: while this approach very much hopes to get imported types eventually, there is a practical need to get something working before the type-imports proposal is finalized. In the short term, we can use either externref or anyref to represent strings; ideally we'd pick the one that will eventually be the top type of all imported types, to avoid breaking changes when type-imports happen. Does anyone (perhaps @rossberg as the champion) have any opinions/predictions on what's most likely there? The existing text appears to assume that externref <: anyref, which is an outdated assumption; now that we have separate hierarchies, we'll have to decide which hierarchy imported types should belong to.
If we decided to standardize a string type, even with very few (or even zero) instructions to go with it, that would also solve this problem.

I would expect to use externref for now and an imported subtype of externref eventually for the imported string API. This is consistent with the idea of using JS strings as directly as possible and avoids having to internalize and externalize the string objects on the boundary (in principle; my guess is that this internalization is a no-op anyway for non-null values).

@jakobkummerow Yes, great point, a hybrid approach does seem like it would address those concerns pretty well.

eqrion commented

There is now a phase 1 proposal for this. I'm going to close this in favor of discussion on that repository.