dslomov/ecmascript-structured-clone

Structured Cloning is not about Code Realms

Opened this issue · 22 comments

You say:

Structured cloning algorithm defines the semantics of copying a well-defined subset of ECMAScript objects between Code Realms.

I think you know what you are talking about, but if so you are misusing the "realm" that that make everything else hard to follow.

An ECMAScript "Vat" (to used Mark Miller's term) is an operating instance of an ECMAScript engine. It consists of a heap of objects (and other values) and a single thread of execution. The objects within a Vat may only primitively references other objects that are within the same Vat. A Vat has one or more (Code) Realms that roughly correspond to an ECMAScript global object and its associated built-in objects. Each ECMAScript function object is associated with a specific Realm. Realms are not in anyway isolated from each other. An object created by or associated with some specific Realm may contain primitive references to objects created by or associated with any other Realm of the same Vat. The Vat's thread of control may freely move back and forth among functions that are associated with different Realms.

Vats are much like Unix processes (prior to the introduction of threads). Different Vats are logically isolated from each other. They do not share a heap, logical address space or thread of control. Objects in a Vat A cannot primitively reference objects in a Vat B and Vat A's thread of control cannot not execute code from Vat B or visa versa. Separate Vats can only exchange or share information by going through some sort of shared communication channel and by encoding data from objects into some sort of serialized data that can be transmitted across such channels.

Structured Clone is a serialization mechanism for serializing and logically duplicating objects(with reasonable fidelity) between Vats. Not all kinds of objects are reasonably serializable or serializable with high fidelity. For example, it is very difficult to serialize an arbitrary ECMAScript function closure object with full semantic fidelity.

Structure Clone is not needed to primitively transfer data between Realms because any ECMAScript function closures can directly reference and manipulate objects that originated in other Realms (There may be platform or application reasons for wanting to logically issolate Realms, but that isn't an ES language level concern).

I think it's very important to not confuse Realms with Vats when talking about Structured Clone. You need to design Structured Clone to work between Vats. In you accomplish that, then the same design can also be applied to logically isolated Realm. The opposite is not necessary true. A Structure Clone design that works between Realms within a single Vat will not necessarily be able to work between independent Vats.

Trying to map this to real-world manifestations, I believe I am hearing that vats are cross-origin iframes, and realms are same-origin iframes?

@domenic Perhaps. A Vat pretty clearly corresponds to what what a process per tab browser would associate with each independent tab. I'm less clear about the subtleties of various kinds of iframes.

I was talking at the level of mechanisms and I think when you are talking about cross or same domain origin you are talking about policies which might be implemented using various combinations of mechanisms.

As a language designer I find it most productive to define clean mechanism level abstractions and try to respond to specific problems that platform and application architects may run into using those mechanisms to implement their policies. Also, it's important that a general purpose language like ES that is used in multiple host environments avoid assimilating host specific policies into its mechanisms.

Vats and Realms are pretty clean base mechanisms to build upon.

you are talking about policies which might be implemented using various combinations of mechanisms.

Sort of, but not really. I am mainly just trying to make sure we are grounded in real entities that exist in implementations.

As a language designer I find it most productive to define clean mechanism level abstractions and try to respond to specific problems that platform and application architects may run into using those mechanisms to implement their policies. [...]

Vats and Realms are pretty clean base mechanisms to build upon.

I guess I am trying to reality-check that last claim. That is, do vats and realms map to things that exist in real implementations, and are thus useful concepts to reify in the spec? Or are they abstractions that might have impedance mismatches with how implementations actually work? Without an actual example of a realm or a vat, they are just theorycraft. And if they are in fact a mismatch for existing concepts, they are useless theorycraft, under the assumption that nobody is going to implement something kinda-like-but-not-exactly what they already have.

And it's especially important when we're talking about structured clone, which is a real thing we're trying to codify, to make sure it applies to real things and not to inapplicable concepts.

A Vat pretty clearly corresponds to what what a process per tab browser would associate with each independent tab.

Hmm. Why does it matter if the browser is process-per-tab? And, how do you structured clone something between two different tabs right now?

A clear example of a "Vat" would be a worker I suppose, with the document that created it being in its own "Vat". However, structured cloning really is between Realms, sometimes from and to the same Realm. I guess the magic ability structured cloning has is that it can cross "Vats".

"Vat" would be hard to define. E.g. due to document.domain independent "Vats" can become merged and that is where implementations start to diverge slightly as well.

Conceptual grouping of Realms that can access each other in a "Vat" would help. I'm not sure we have enough implementation experience yet to get there. (E.g. Google is still experimenting with out-of-process <iframe>.)

Thanks Allan, your comment is spot-on. The aim of structured clone algorithm is to create a copy of JavaScript value (belonging to a certain subset) in a target realm that is completely independent and has no access or reference to a source realm. I agree that the info text that you object to is misleading and may leave the impression that code realms cannot directly access objects belonging to each other.
So this introductory statement must be fixed.

However, we do not have a definition of "vat" in ECMAScript spec, and it seems like the body of this specification does not really need it - the algorithm only deals in terms of realms, and it seems to me that even if we had a notion of "vat" the algorithm still needed a specific target realm within that vat to create a clone in.

On the other hand, unless I am mistaken, we also do not have a notion of "a value that has no access to a realm", so I am not sure how to strictly define within the terms of ECMAScript spec the requirements on the result of this algorithm. Ideas?

@dslomov-chromium A "Vat" is essentially what the ES6 spec. defines. It's just easier to say "Vat" than it is to say "an instance of the thing defined by Ecma-262". Multiple Vats might be implemented as multiple ES engine instances running in separate processes on the same machine, or ES engines running on different machines, or even by carefully implementing an engine that can be multiply instantiated within a single OS process (the latter is probably the hardest and least secure way to implement a Vat). Semantically, there should be no difference between these styles of Vat implementation. The major difference that might be relevant is the sort of communications channels that are available for inter-vat communications.

The reason I strongly resists defining structured clone in terms of Realms, is that you simply don't need it to communicate between Realms (even the concept of "communicating Realms" isn't really right). Realms within a Vat share a common domain of values (including objects) that they can freely interchange, no serialization is required. So the base problem is really how to replicated values composed of objects between isolated Vats. Solve that problem and you can also apply it to Realm level interchanges if you want.

Note that you can create isolation within a Vat using membranes to isolates Realms and in some cases browsers may have to do that. But it's hard and likely buggy. If you have a structured clone that works between Vats it will work between such isolated Realms. But you shouldn't start with issolated Realms in a common Vat as your design center for structured clone. There are just too many issues you might gloss over when dealing with Realms in a common Vat. Design for separate Vats and you will end up with a solution that also applies to isolated Realms. You are less lightly to get there starting with Realms.

In answer to your last question, you have to address this as a serialization problem. How can you describe a set of objects in a manner that can be transmitted over a communications channel such that a similarly structured set of objects can be reconstituted in another Vat at the other end of the channel.

@annevk Vats and Realms are building blocks that give you the concepts and mechanisms to describe and extend browser behaviors. You want to start with clean building blocks and then pile on the cruft that browsers actually do.

Realms are actually a pretty weird concept that exists in ES6 only because we have to have something to conceptually support what browsers have done with multiple global objects and same origin iframes. In a better world Vats would be all we would need. Multiple globals and built-ins in the same address space creates all sorts of complexity that could have been avoided.

On the other hand, unless I am mistaken, we also do not have a notion of "a value that has no access to a realm",

I have a hard time understanding what you mean by the quoted phrase. I think you may not yet fully understand what an ES6 Realm represents.

@allenwb, you're quoting @dslomov-chromium there (and you already replied to that bit). As for my post, I understand what you're trying to do, I was just giving some examples as to how they line up with things we have today (and where there would be challenges).

@domenic A lot of what we are talking about here is tracking the the early evolution of operating systems. Early multitasking operating systems tried to run all programs in a single common address space and depended upon "good behavior" to keep separate programs/users from interfering with each other. When that was discover to not work so well, some systems tried to use language runtimes and other software mechanisms and conventions (a rough analog to Realms) to limit the individual programs' ability to interfere with each other. That proved difficult, buggy, and hence unreliable. The introduction of hardware memory mapping provided the solution to the problem. Everything becomes simpler with when address space isolation is imposed at a low layer of a system. None of this is theory, its engineering history and the basis of all modern platforms.

The browser platform can and should learn from this history as it evolves.

@annevk Oops, scrolling registration error on my part.

I agree that mapping how how Vats and Realms related to actual browser abstractions is a useful exercise.

I think it's fair to say that more than just a Realm is involved in implementing an iframe. In particular, cross-origin iframes involve at least a Realm and some sort of membrane mechanism. They probably could also be modeled using a Vat + inter-Vat communications channel.

@allenwb sure, I do not think this spec is really about "communicating realms". This spec just uses the "target realm" to create clones of objects from source vat (as I said before, when we communicate between vats, we still need a particular realm in a target vat to create objects). I believe the current spec works between vats with no issues.

I am still not sure whether you object to the intro text, or to the mechanics of the spec itself.

@allenwb replying to this:

On the other hand, unless I am mistaken, we also do not have a notion of "a value that has no access to a realm",

I have a hard time understanding what you mean by the quoted phrase. I think you may not yet fully > understand what an ES6 Realm represents.

Sorry, my mistake. What I meant was that "we have no notion of value that has no access to a vat" (the result of structured clone algorithm should have no references to objects in source vat if source vat is not the same as a target vat). For this notion we have not formalism in ECMA spec.

One obvious strawman on how to resolve this without explaining what vats are.
Instead of structuredClone(object, targetRealm) -> object we can have a pair of functions:
serialize(object) -> blobOfBytes
deserialize(blobOfBytes, targetRealm) -> object
In this case, by definition, deserialize has no access to source vat, therefore the resulting object cannot depend on anything in source vat. This structure has an unfortunate consequence of making the wire (blobOfBytes) explicit however.

@dslomov-chromium we could leave the exact format to implementations for now and still define it in that way, noting we'd define the exact format in due course. (We'd simply require that what they serialize they also need to be able to parse into the same object.)

@dslomov-chromium

What I meant was that "we have no notion of value that has no access to a vat"

By definition, all ECMAScript values exist in a Vat. But values of the ECMAScript language types Number, Boolean, String, Undefined, and Null can all be externally encoded independent of any Vat and hence can be perfectly replicated in a different Vat. Symbol and Object values both have characteristics (for exampler, identity) that couple them to a specific Vat so they cannot be perfectly replicated in a different Vat, but in some cases useful analogs of the original values can be created in a different Vat.

I'd say that's what the Structured Clone specification should be all about. Which values can be replicated in a different Vat with what degree of fidelity.

(the result of structured clone algorithm should have no references to objects in source vat if source vat is not the same as a target vat).

I'd say that there should be no exceptions, like the above. That's why I keeping pushing so hard that you need to think about structured clone only from the perspective of imperfect inter-vat replication. A intra-vat clone can do many things that are impossible to do inter-vat. I don't think you want to clutter up the semantics of structured clone with all sorts of "if the the source and target vats are the same, do X otherwise do Y". Structured Clone needs to be about Vat-to-Vat replication (where replication to or between Workers is a primary browser example).

Instead of structuredClone(object, targetRealm) -> object

This couldn't possibly describe the inter-vat case because the caller of this functions has no way to directly represent the return value if it is an object in a different vat.

In this case, by definition, deserialize has no access to source vat, therefore the resulting object cannot depend on anything in source vat. This structure has an unfortunate consequence of making the wire (blobOfBytes) explicit however.

Exactly, serialization/deserialization is what you need to do. But you don't have to define a concrete wire formal. You can specify the serialialization/deserialization in terms of an abstract model (for example the BNF or an AST for an(hopefully) simple abstract object description language). You might not ever actually describe an actual wire format for that language(leave it to implementations) or an implementation might not actually encode/decode to it. For example, if shared memory is available at the Vat implementation level then direct generation of target Vat objects may be possible. But the important thing is that the replication semantics are constrained by the abstract serialization/deserialization process.

@allenwb I think we are in a complete agreement, it is just that the terminology is in the way of our communication.
Structured clone has always been about inter-vat messaging (in fact that is the implementation at least in Blink/V8), and you are completely right that:

I'd say that's what the Structured Clone specification should be all about. Which values can be replicated in a different Vat with what degree of fidelity.

I completely agree with this, and that has always be the intent of this spec.

(the result of structured clone algorithm should have no references to objects in source vat if source vat is not the same as a target vat).

I'd say that there should be no exceptions, like the above.

Right I didn't mean that structured clone should behave differently if source vat is the same as the target vat (that would be horrible). It is just that if the target vat is the same as source vat then the invariant that the result of structured clone has no references to the objects in source vat has no chance of holding (there is only one vat involved in this case :))

Instead of structuredClone(object, targetRealm) -> object

This couldn't possibly describe the inter-vat case because the caller of this functions has no way to directly represent the return value if it is an object in a different vat.

structuredClone is not any kind of callable function here, it is a spec algorithm. If vat was a real spec notion, then we could write its signature as follows:

structuredClone(object, sourceVat, targetRealm, targetVat) -> object

with the invariant that object belongs to the sourceVat, targetRealm belongs to the targetVat and the resulting object belongs to the targetVat. We still need targetRealm to properly construct various JS objects (targetRealm defines e.g. the Object.prototype value etc).

As such, I believe this can work [this is how the HTML5 spec works, and it is proven to be implementale for inter-vat], but of course it might be buggy (since it is hard to ensure that sourceVat does not accidentally leak into the result).

I like the idea of using an abstract model instead of blobOfBytes! I trust we can then avoid any talk about vats, and avoid specifying the wire format in any form, and still provably support inter-vat.

@dslomov-chromium

I think we are in a complete agreement, it is just that the terminology is in the way of our communication.

Excellent! Terminology is very important at this level of spec. writing where we are dealing with slippery concepts that many people are not used to thinking precisely about. Hence my concern about how you were using "Realm" in this context.

structuredClone(object, sourceVat, targetRealm, targetVat) -> object
with the invariant that object belongs to the sourceVat, targetRealm belongs to the targetVat and the resulting object belongs to the targetVat. We still need targetRealm to properly construct various JS objects (targetRealm defines e.g. the Object.prototype value etc).

Yes, but if we did we would also have to make many other changes to the ES6 specification language where objects (and other values) are implicitly constrained to a single domain. This would add a lot of complication to the ES spec. I'd like to avoid those complications--forever, if possible. I think Vat is a very useful concept in something like the HTML spec. that needs to talk about multiple isolated ES execution environments. But I'm hopeful it doesn't need to be brought into the ES spec. which I hopeful can be limited to dealing with one implicit Vat plus the "outside would".

I like the idea of using an abstract model instead of blobOfBytes! I trust we can then avoid any talk about vats, and avoid specifying the wire format in any form, and still provably support inter-vat.

Yes, if you go down that path then it should be irrelevant whether or not you are deserializing back into the same or a different Vat.

Hope this has all been helpful.

I think if you want to define document.domain semantics properly you need to define Vat in ECMAScript. I recommend talking to e.g. @bholley as we have something just like vats in Firefox and they are a bit ugly at the edges.

I think if you want to define document.domain semantics properly you need to define Vat in ECMAScript.

I don't know if that's true, at least not explicitly. The HTML5 spec has the notion of "unit of related similar-origin browsing contexts" which basically describes the vat of minimal cardinality for the web platform (assuming that you are willing to remote the limited API surface that is exposed cross-origin on Window and Document). If you want document.domain to work right, everything that might end up same-origin needs to be in the same vat. But this constraint pretty much falls out of other ones, and probably doesn't need to be normative.

In Gecko, pretty much everything but workers lives in a single vat. We have built-in security membranes at Realm boundaries that give us the isolation we need. We also make use of intra-vat structured cloning [1] from self-hosted DOM APIs, frontend, and addon code, though that's unlikely to be relevant to the spec.

The wire format should definitely remain an implementation detail.

[1] https://developer.mozilla.org/en-US/docs/Components.utils.cloneInto and https://developer.mozilla.org/en-US/docs/Components.utils.exportFunction

@bholley fair. I guess what that argues for is that HTML, if done right, needs to define the security membranes (which seems super annoying).

@annevk

that argues for is that HTML, if done right, needs to define the security membranes

I took away the opposite conclusion from what @bholley said. He seems to be saying that it is the capability of dynamically removing cross-origin security boundaries by changing document.domain that forces things that otherwise could be hosted in independent Vats into a single common Vat.

If that capability could be removed from HTML (and he seems to suggest that it perhaps could be) then there wouldn't seem to be the need for HTML to define that security membrane. Implementations that choose to use a single common Vat (ie, current Gecko) would still have to implement a security membrane but it would be an implementation-specific detail. The isolation semantics of such a membrane would still have to be exactly the same as if separate Vats were used.

I would love to remove document.domain but I'm doubtful that we can. Nevertheless, I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1034120 with the suggestion to start warning when people set it.

I think even without document.domain Chrome is running into problems into isolating <iframe> resources from truly distinct origins, but hopefully they can overcome those.

@bholley fair. I guess what that argues for is that HTML, if done right, needs to define the security membranes (which seems super annoying).

Only if you want revocation. Gecko does this, but the spec and Blink don't. Their model is that Window and Location do security checks, and if you have references to other objects (that were previously same-origin but now cross-origin, thanks to document.domain), those references keep working until the object graph leads you back to Window or Location.

This makes document.domain pretty hard to use safely, but @Hixie and @abarth aren't really interested in complicating their respective spec and implementation to improve them, which I understand. When we met last December, everyone agreed that we'd like to eventually remove document.domain, even if it happens ~10 years down the line. This seems like the best road to compat.