Order dependence for equality
bakkot opened this issue Β· 116 comments
From the readme:
if they have the same values stored and inserted in the same order, they will be considered strictly equal
This is surprising to me. Contrast python dicts or Purescript records.
It's also just a weird path-dependence in values: I would not expect it to matter if I construct something by doing (const { a: 0 }) with .b = 1
or (const { b: 1 }) with .a = 0
I got some other negative feedback on the order. Basically it's there to be "mindful" of the engines so we have something that remotely looks like the hidden class implementation optimization. But I think that last consideration could be brought up later... I'm gonna make an another pass on the proposal text at some point soon I'll ping that issue when I do!
Another interesting consideration is that I expect object with .key = object.key
to be a no-op. However, if key ordering matters, then wouldn't object with .key
push key's "order" to last, and thus (object with .key = object.key) !== object
in many occasions?
If order matters then yes, and I agree it'll generate a lot of confusion. I was trying to be mindful of the engine before the user, it'll get updated!
I think it would be better if instead for const
objects enumeration order was in alphabetic order (or at least a consistent ordering). e.g. Object.keys(const { x: 10, y: 20 })
would be the same as Object.keys(const { y: 20, x: 10 })
.
Sets, Maps, and objects (roughly) all follow insertion order for enumeration - that seems the precedent to follow.
Insertion order isnβt likely to matter when comparing two Sets in the Set method proposal - so Iβd assume thatβs the precedent to follow for comparison too.
If we did have Object.keys
returning different sequences for those two objects but ===
still holding do we think this is a bad thing?
There's a related (but inverse) example in the language already where NaN !== NaN
despite the fact NaN
will act identically in all cases.
For this case it'd be the opposite way, two objects could be ===
but not behave identically in all situations. In which case would we need Object.is(const { x: 10, y: 20 }, const { x: 20, y: 10 })
to return false
?
To clarify we currently have these rules that hold true:
===
implies you can use either the LHS or RHS and the behaviour will be identical- But
!==
doesn't imply the reverse because ofNaN
- But
Object.is
implies two values are the same in all circumstances includingNaN
and you can pick either and get the same behaviour in all cases
But making const { x: 10, y: 20 } === const { x: 20, y: 10 }
would break the first implication. This might be okay but my feeling is we'd never want to break the Object.is
implication as well though.
Order dependence can be removed if you're willing to allow some extra complexity that ranges from O(n)
best case, O(n log n)
to O(n^2)
worst case (depending on implementation), and average case being somewhere in the middle. It's worth noting that Clojure, which uses immutable hash maps for nearly everything, also uses an order-independent equality algorithm for its maps and sets and does not have any significant performance issues from doing that. (I don't think it even offers a native order-dependent equality algorithm for its hash map data type.) They do not guarantee order, but I'm not sure how much of an effect that has on their algorithm.
@ljharb, you can't both have insertion order for iteration and make insertion order not matter for equality comparison. I'm with @Jamesernator - alphabetic order, possibly with nonnegative integer keys first, is the way to go.
(The set methods proposal isn't adding an "equals" method, so I'm not sure what you're referring to there, but in any case that would be a very different thing from two things being ===
. The horrible edge case that is -0
aside, ===
implies that the left and right are the same value, which means they should not be distinguishable under any circumstances.)
@bakkot You can't avoid being able to distinguish them as long as you can iterate them - even if the iteration order is undefined, you'll still be able to tell many ===
records apart by iteration order. And ===
doesn't always imply "same value":
NaN !== NaN
, one of my biggest criticisms of IEEE-754.- An embedder might choose to expose a pointer to a JS object handle to JS code as a number, and if this handle happens to be a string, two pointers to a
===
string might still be!==
. (It's a similar story with bigints.)
As for the stated semantics, it's intended to express non-coercing equality, not necessarily complete indistinguishability. Checking that is the goal for Object.is
, not ===
.
Edit: Filed #20 to ask about this.
@isiahmeadows
You can't avoid being able to distinguish them as long as you can iterate them - even if the iteration order is undefined
Only if the iteration order is undefined, which it should not be. If it is defined purely in terms of which keys are present, then iterating would not allow you to distinguish them.
NaN !== NaN
, one of my biggest criticisms of IEEE-754.
I'm familiar with equality semantics in JS, but "===
implies same value" does not mean "!==
implies not same value".
An embedder might choose to expose a pointer to a JS object handle to JS code as a number
Technically this is not forbidden by the spec, but the spec should not be built on the assumption that engines are going to do so.
Checking that is the goal for
Object.is
, not===
.
The fact that ===
uses different semantics from Object.is
is one of the worst parts of the language. We are not going to add any new values for which the two are not equivalent.
I think the way this is going to end up is the following:
- Order will not matter for comparison
- Enumeration is still in insert order
This might change or e reviewed later on as we get closer to actual implementation (if that proposal ever goes that far)
I feel that many of the listed goals for value types will not actually be upheld if that is the case. A lot of the stated goals I feel fundamentally rely on the idea that:
Given a pure function f, and value types A and B, if A === B, then f(A) === f(B)
However, if A and B can ===
, but return different results for Object.keys()
(another way of saying that their iteration order is different), then no guarantees can bee made about f
anymore. As such, many algorithms that want to rely on being able to "early return" when data is cheaply compared to be equal, will not actually be able to rely on ===
to do so. As a trivial example, simply imagine a React view that creates a list from the keys and values of an object. If this iteration is not guaranteed, then you can't simply say "I don't need to re-render if old-model === new-model".
That's a good point: I removed all ordering considerations for now from the proposal and we can keep discussing in that issue
We decided to have a stable key ordering: @rickbutton is working on it now!
speaking of! #25
@ljharb The explainer says that record fields are sorted, and it sounds like this is not done by the order they are present in the literal.
I'm still worried that with a sorted order, ===
won't be easy to make cheap and fast with existing object representations in engines, and people in this space seem to really want cheap and fast deep equality between objects. The short-term worry is that the slowness of ===
hurts adoption.
I can envision how it can be made fast with different object representations, but the current hidden class / shape "linear history of property insertion" representation is pretty pervasive. Speculative optimizations are often checked for validity with a single shape pointer, for example.
It is attractive to have these const values things to like extensible record types from other languages, but that's a big departure from objects.
Without sort order we'll have a worse problem that prevents adoption: the comparison may be fast but it isn't useful (or at least, isn't any more useful than current objects). Per my example above, if you can't rely on f(a) === f(b) if a === b, then the utility of these objects decreases significantly. You can no longer rely on early return optimizations when a === b, so it becomes unclear what the benefit of these objects really is outside of "engine optimization" (as opposed to algorithm optimization that the users can employ). That is to say, maybe using immutable objects gets you "under the hood" speedups, but they can't really be safely reasoned about from the outside as being any safer than normal objects. In fact, a frozen mutable object is arguably "safer" in this regard than these immutable objects, because if frozen(a) === frozen(b), then (as far as I know) you can safely early exit, while you wouldn't with these immutable objects.
I hear you. I also certainly don't advocate for a world where Object.keys
order can differ for two ===
objects. Let's see if there's any engine pushback. If not, then great!
I'm hopeful that engines would be able to implement this with just a tweak to how hidden class transitions work: the new hidden class can remain a function of the current hidden class and the new key, it would just be a different function.
===
would still be slower than it is for objects, because you have to check which values the keys have in addition to which keys are present, but I don't think the fact that keys are in sorted order should impose any additional slowness to ===
as long as you can make the hidden class transitions take order into account.
But this is definitely a question for implementors.
@bakkot Agreed that the value checking isn't likely to matter here.
It also might just be a tweak that works out cleanly, that'd be great. To be concrete, it'd be nice if an initial implementation in current engines didn't incur a bunch of perf cliffs. For example, I'm worried of JIT code being invalidated due to "insertions" when creating new immutable objects, because their hidden class has a different transition than just "append".
My apologies and I do not mean to be argumentative, but I think our position vis a vis implementors should be that this is a deal breaker -- in other words, I think this feature isn't really worth it without sorted keys. My reasoning is that without sorted keys, the "best practices" around using immutable objects becomes incredibly sophisticated and subtle. In fact, I believe that "immutable objects" becomes one big expert-level gotcha. As a framework writer, I know that I have to essentially ignore their existence with user-supplied objects for the reasons listed above, but as a library user, even for something as simple as a memoizer, I have to hope that the library author is very aware of these issues. It's already the case that many memoizers fail with existing gotchas (for example, many erroneously believe they already have memoized results for "hasOwnProperty" or other keys that exist on Object's prototype), this would add yet another layer of submarine bugs.
The simplest proof-of-concept case of course is just memoize(Object.keys). Although this memoized function does not have a tremendous amount of utility, it does highlight the difficulty of implementing such a memoizer and the contradictory nature of the situation: if insertion order matters, then the memoizer should return the insertion-order listing of the keys -- but it will very likely fail to do so for indistinguishable "different" objects.
As I mentioned earlier, without sorted key I'd actually recommend frozen objects to users instead, where the expectations are more straight-forward -- it is not a "content addressable" object, it is merely a non-changeable object, where === works exactly the same as in all other contexts (that is to say, two "structurally identical", but different, objects are always !==). I don't mean this as a cop-out, but rather saying that we have an existing feature that is easier to understand (and more consistent in my opinion) for what this proposal would provide without sorted keys.
I do have a few questions about sort order:
- How should integer index and symbol properties fit in relative to strings? I feel integer index properties should come first, symbols should come last. (Symbol order can come into play with
Object.getOwnProperties
.) - How should integer index properties be sorted relative to each other? The obvious choice is just lowest to greatest.
- How should strings be sorted relative to each other? The obvious choice is just lexicographic order.
- How should symbols be sorted relative to each other? You could do insertion order, but
const {[sym1]: "foo", [sym2]: "bar"} === const {[sym2]: "bar", [sym1]: "foo"}
is something you would expect to evaluate totrue
, and so you would require either a total ordering of symbols for immutable objects or a global hash map tracking key order for each set of keys.
The last one would encounter a major slowdown, but about the best way I can think of for that, without creating substantial perf loss, is to keep an ordered hash map of symbol keys and GC references, and:
- When a new symbol is not previously used, append it to the set.
- When a symbol is removed from the last immutable object or if all remaining immutable object references are collected, remove it from the set.
- The key index of a symbol represents the sort order.
This would have to be specified as an implementation-dependent hook, tied into weak references as it has similar security and observability concerns. One could check, using a local const testSym = Symbol()
, that if a given key sym
is used in any immutable objects by checking the output of Object.keys(const {[testSym]: "foo", [sym]: "bar"})
. It returns [sym, testSym]
if such an object exists with a reference to that, [testSym, sym]
if not. However, without the GC hook, it amounts to effectively a required memory leak, something that itself IIRC was a past issue with tagged template literals, specifically with the template arrays themselves, and it would be especially annoying on long-running server-side processes.
This overhead is also going to be an issue on low-resource IoT environments (e.g. Moddable, Tessel) and similar environments with stringent resource restrictions.
Regarding Symbol ordering, perhaps this went without saying, but I think you can avoid the complex cases a large part of the time by stating that you first sort on Symbol.description
, and only if those are equal, do you then fall back to the fancy techniques described above. This means that all the common cases of merely wanting to include fairly normal Symbols (which often already go through a great deal of trouble to self-uniqueify for the purposes of backwards compatibility in transpiled cases) should be no less performant than string keys. I believe it would be safe to make this a spec requirement (that is to say, maybe we leave the case of Symbol("x") !== Symbol("x") implementation dependent to find an ordering, but Symbol("a") always < Symbol("b")).
From an implementor's perspective, it might be possible to create further simplifying cases. For example, if source/position of a Symbol were kept at creation time, this would also significantly reduce the cases of otherwise identical Symbols. Symbol("a") [from a.js:100] < Symbol("a") [from b.js:4]. You could of course imagine an even more complicated stack-based approach to handle "symbol factories", etc. leading to only truly pathological cases like [1,2,3].map(()=>Symbol()) requiring distinguishing (even then, you could maybe slap a timestamp on them?).
That being said, I think it might be alright to keep per-object (partial) insertion order for Symbols with identical descriptions in objects. So:
sym1 < sym2 in object if sym1.description < sym2.description or insertion_order(sym1, object) < insertion_order(sym2, object)
That is to say, I think it might not be unreasonable to say that:
const a = Symbol("a");
const a2 = Symbol("a");
const b = Symbol("b");
#{ a: 1, b: 2 } === { b:2, a: 1 }
#{ a: 1, a2: 2, b: 2 } === { b:2, a: 1, a2: 2 }
// BUT:
#{ a: 1, a2: 1, b: 2 } !== { a2: 1, a: 1, b:2 }
This certainly does not break any promises of content-addressable objects: for every case that a === b, f(a) === f(b)
. In the third case above, a !== b
, and f(a) !== f(b)
(case in point when f === Object.getOwnPropertySymbols
). (For the record, even if f(a) === f(b)
, this would still not break any promises as it is not an iff). To further clarify, we could have made per-object insertion order matter across the board for ALL keys as a way of trivially maintaining the if a===b then f(a) === f(b)
rule, but that would reduce the utility of immutable objects, hence wanting to employ sort in as many cases as possible. Here we are simply stating that in the very rare case of objects with Symbols that have identical descriptions, but are inserted in opposite order, they represent distinct objects.
@tolmasky Didn't think about sorting lexicographically by .description
. That certainly gives a nice total Edit: partial order without the overhead, but yeah, symbols with identical names are a problem.
I feel non-array immutable objects should be compared via unordered equality, basically this:
// Basic spec
const _hole = {}
const _get = (o, k) => {}.hasOwnProperty.call(o, k) ? o[k] : _hole
function constObjectEqual(a, b) {
if (Object.getPrototypeOf(a) !== Object.getPrototypeOf(b)) return false
const aKeys = Reflect.ownKeys(a)
const bKeys = Reflect.ownKeys(b)
if (aKeys.length !== bKeys.length) return false
const aSet = new Set(aKeys)
for (const key of bKeys) {
if (!aSet.has(key)) return false
const aDesc = Reflect.getOwnPropertyDescriptor(a, key)
const bDesc = Reflect.getOwnPropertyDescriptor(b, key)
if (aDesc == null) return false
if (bDesc == null) throw new TypeError()
if (
_get(aDesc, "get") !== _get(bDesc, "get") ||
_get(aDesc, "set") !== _get(bDesc, "set") ||
_get(aDesc, "value") !== _get(bDesc, "value") ||
) return false
}
return true
}
// Baseline optimized: assumes in-order first, but equivalent to above
const _hole = {}
const _get = (o, k) => {}.hasOwnProperty.call(o, k) ? o[k] : _hole
function constObjectEqual(a, b) {
if (Object.getPrototypeOf(a) !== Object.getPrototypeOf(b)) return false
const aKeys = Reflect.ownKeys(a)
const bKeys = Reflect.ownKeys(b)
if (aKeys.length !== bKeys.length) return false
for (let i = 0; i < aKeys.length; i++) {
if (aKeys[i] === bKeys[i]) {
const aDesc = Reflect.getOwnPropertyDescriptor(a, aKeys[i])
const bDesc = Reflect.getOwnPropertyDescriptor(b, aKeys[i])
if (aDesc == null || bDesc == null) throw new TypeError()
if (
_get(aDesc, "get") !== _get(bDesc, "get") ||
_get(aDesc, "set") !== _get(bDesc, "set") ||
_get(aDesc, "value") !== _get(bDesc, "value") ||
) return false
} else {
const aKeySet = new Set()
let j = i
do {
aKeySet.add(aKeys[j])
} while (++j !== aKeys.length)
do {
if (!aKeySet.has(bKeys[i])) return false
const aDesc = Reflect.getOwnPropertyDescriptor(a, bKeys[i])
const bDesc = Reflect.getOwnPropertyDescriptor(b, bKeys[i])
if (aDesc == null || bDesc == null) throw new TypeError()
if (
_get(aDesc, "get") !== _get(bDesc, "get") ||
_get(aDesc, "set") !== _get(bDesc, "set") ||
_get(aDesc, "value") !== _get(bDesc, "value") ||
) return false
} while (++i !== bProps.length)
break
}
}
return true
}
It looks complicated and slow for the above algorithm, but it's easily optimized for most common cases:
- If both operands share the same type map, you can just compare the values and skip the map altogether.
- You can start iteration assuming keys are in the same order and check for equality, and bail out to the hash map-based slow path only if the keys themselves differ. This is implemented in my baseline-optimized version above.
- If neither object is a proxy, you could match the tail in order after the head fails, too, and it'd still just work. This also saves some time and lets you avoid an allocation with minimal runtime cost.
Seems like we're approaching agreement here that we'd like a total ordering on property keys, in order to enable records to have "sorted" keys, which lets ===
return true
even if the insertion order is different, while maintaining expected lack of observable differences between values which are ===
(apart from a fixed set of legacy floating point exceptions). I like the framework proposed by @isiahmeadows that we start by imitating ordinary objects, with integer indexed property keys, then strings, then symbols. Strings can be sorted in code point lexicographical order, and the hard part is symbols.
A simple mechanism to create a total order on symbols (within an agent) would be to have a per-agent incrementing counter, starting at 0, with each symbol getting a unique counter value. Symbols are compared according to the counter. This would be cheap as well as simple. Agents are single-threaded, so there is no ambiguity about how the counter would work.
Would such a mechanism constitute some kind of information leak? I don't quite see what information it'd be leaking; it seems fine to me. If people do have concerns, do they have any ideas for alternatives? cc @erights
A simple mechanism to create a total order on symbols (within an agent) would be to have a per-agent incrementing counter, starting at 0, with each symbol getting a unique counter value. Symbols are compared according to the counter. This would be cheap as well as simple. Agents are single-threaded, so there is no ambiguity about how the counter would work.
An atomic counter would still be reasonably quick. In Intel, getting such an ID is two instructions, mov reg, 1
+ lock xadd [addr], reg
, and in ARM, it's three: ldrex
+ add rN, rN, 1
+ strex
. And although those do decode to more micro-ops, the cost of those are dwarfed by all the other work you'd need to do anyways.
It might be necessary to require an atomic counter if symbols are ever exposed outside the agent that created them. This could potentially be required if symbol keys are retained in const objects sent over postMessage
or similar.
Separately, if this helps inform anything:
- Clojure's iteration order is insertion order (I believe) for up to 8 elements, undefined for greater.
- Immutable.js's
Map
iteration order is undefined, but stable over referential equality. - Most current implementations, including both above, use persistent hash array mapped tries, but there's still active research out there attempting to improve upon that. HAMTs have an iteration order based on internal object hashes, and for obvious reasons, this shouldn't be exposed to JS.
@isiahmeadows Symbols cannot be communicated via postMessage
because the structured clone algorithm does not allow Symbols.
@rickbutton I'm aware of this. I'm saying hypothetically if this restriction were to be lifted for whatever reason.
ah, yes! understood @isiahmeadows
A simple mechanism to create a total order on symbols (within an agent) would be to have a per-agent incrementing counter, starting at 0, with each symbol getting a unique counter value. Symbols are compared according to the counter.
I think we should first emit all signals which have descriptions according to code point lexicographical order of the description, breaking ties using this counter. Using descriptions first also means we don't have to specify a creation order for all of the built-in symbols, as long as we ensure they all have unique descriptions.
Also, I think we would need to specify that Symbols created with Symbol.for
must come before other Symbols (or, if we sort by description first, other Symbols with the same description). Otherwise Symbol.for
could be used as a side channel: I could determine if anyone has ever called Symbol.for('a')
by observing the iteration order of #{ [Symbol('a')]: 0, [Symbol.for('a')]: 0 }
.
A global counter does introduce new mutable global state, so I'm not sure if it would fly with @erights. I don't think you can build a side channel out of it in the traditional sense (given the above tweaks), but you can build a kind of sneaky communications channel: Alice makes two symbols and passes them to Bob, and Bob can derive one bit of information by discovering (using this mechanism) which was created first. That's... probably OK?
I think we should first emit all signals which have descriptions according to code point lexicographical order of the description, breaking ties using this counter. Using descriptions first also means we don't have to specify a creation order for all of the built-in symbols, as long as we ensure they all have unique descriptions.
That could work for Symbol.for
, but how would it work for this?
const symA = Symbol()
const symB = Symbol()
I see stuff like that a lot in the wild, symbols without descriptions.
Also, what about something like this?
// Suppose a class like this, but doing something a lot more involved.
class Mirrored {
constructor(file) { this._file = file; this._data = Symbol("data") }
async init(v) { return #{...v, [this._data]: /* get data from file somehow */} }
get(v) { return v[this._data] }
}
const SomeKey = new Mirrored("/path/to/file")
const AnotherKey = new Mirrored("/path/to/another/file")
// Later, initialize
const object = await AnotherKey.init(await SomeKey.init({
// whatever data happens to be relevant
}))
// And later, iterate for some reason
for (const [key, value] of Object.entries(object)) {
// do stuff...
}
Edit: Prematurely sent
@isiahmeadows, as I said, you would break ties using the counter.
@bakkot Sorry, missed that bit.
An alternative solution to the Symbol problem is to require that iteration order of undistinguishable Symbols be randomized every time it is queried.
Guaranteed randomization would once again break one of the more useful properties of content-addressable objects that f(a) === f(b)
if a === b
. Essentially, you'd have a strange immutable object that represents a truly random variable. To implement some sort of bizarro coin toss algorithm:
const heads = Symbol();
const tails = Symbol();
const coin = #{ [heads]: heads, [tails]: tails };
function toss(coin) {
return Object.getOwnPropertySymbols(coin)[0] === Object.getOwnPropertySymbols(coin)[0] ? heads : tails;
}
Note that the above would return a random answer only with immutable objects, but strangely returns a consistent answer with normal javascript objects (which would base getOwnPropertySymbols
on insertion order).
Is there a reason insertion order can't be used like it is for other objects? One would think that would be efficient enough, plus it doesn't require a whole new line of engine optimizations to account for them. Plus, it's worth noting that modern strongly-typed functional languages don't use HAMTs behind the scenes either for record types - they just use C-style structs with a possible pointer to type info if necessary.
Edit: I mean iterating in insertion order here, not just with ===
.
Edit 2: Just ignore this bit. It's off-topic.
Perhaps I misunderstand the question, but the original reason for this bug is that insertion order can't be used for ===
(or iteration) because it would cause seemingly identical objects to have different properties, or alternatively force objects to not be ===
depending on insertion order.
Unless we mean "global insertion order" and this is just referencing "the counter" again?
If the question is just "insertion order" though, then we're right back to the problem of:
(const { a: 0 }) with .b = 1 !== (const { b: 1 }) with .a = 0
Or, if they are ===, then breaking f(a) === f(b)
when a === b
(since iterating keys of a and b would be different).
I like the framework proposed by @isiahmeadows that we start by imitating ordinary objects, with integer indexed property keys, then strings, then symbols. Strings can be sorted in code point lexicographical order, and the hard part is symbols.
That's unobservable. You can't do new Proxy(constObject, ...)
because const value types are to my understanding not objects in the way ordinary objects and function objects are.
This issue was initially asking if #{a: 1, b: 2} === #{b: 2, a: 1}
should evaluate to true
or false
.
true
means order doesn't matter, which is what I want.false
means order does matter, which is what gave rise to the question of "what's the order?".
I feel this distinction got lost in the discussion, and I'm hoping I could drive this back on topic.
So, I think we are agreed that #{a: 1, b: 2} === #{b: 2, a: 1}
should evaluate to true
, but I guess (at least to me), that means that order does matter in the sense that we need them both to return the same ordering of keys and symbols in order for the ===
to be meaningful. If they ===
but behave differently then it's not a full resolution to the problem.
[...] but I guess (at least to me), that means that order does matter in the sense that we need them both to return the same ordering of keys and symbols in order for the === to be meaningful.
This is incorrect. Key order is unobservable over equality if #{a: 1, b: 2} === #{b: 2, a: 1}
evaluates to true
, as I pointed out in my last comment. If it weren't for the fact you could iterate the properties of const objects, the key order wouldn't even need spec'd and would just be an implementation detail.
Keep in mind, if key order is not important, only the collection of properties, it would be a perfectly reasonable implementation to use a (HAMT, List<HAMTNode>)
pair to represent const objects. The first, a hash array mapped trie, holds all the values, and the second, a list of child HAMT nodes in order of insertion, holds the set of properties. As long as the HAMT and property key list remain aligned, equality is just HAMT equality, a reasonably fast thing to check. Iteration will require iterating the HAMT node list to iterate the properties correctly because the HAMT is ordered by hash value, but that's the only thing that exists for.
@isiahmeadows The representation is not really the interesting question at this point. The questions of interest are
- Is
#{a: 1, b: 2} === #{b: 2, a: 1}
? - Is the array given by
Object.keys(#{a: 1, b: 2})
in the same order as the array given byObject.keys(#{b: 2, a: 1})
? - If so, what order is that? (There's an obvious order for string keys, but not for symbols with the same
.description
.)
I and several other people in this thread have expressed the opinion that the answer to 1 should be "yes", and that if the answer to 1 is "yes" then the answer to 2 must be as well. That leaves 3 as the remaining question with no obvious answer. Are you intending to disagree with those answers to (or dispute the premise of) 1 or 2? Are you proposing a solution for 3? I can't quite tell.
@bakkot Oh okay. I'm of the opinion 2 should be "no", just to keep it consistent with objects, and if necessary, Object.is(#{a: 1, b: 2}, #{b: 2, a: 1})
can also return false
. Already, ===
doesn't imply both sides are observably identical and !==
imply observably different (consider: +0
vs -0
, NaN
s).
For registered symbols, if they are allowed at all, they must only be sorted by their string name. Any proposal that allows unregistered symbols must sort these separately from registered symbols.
If you rank unregistered symbols by incrementing a count, it would certainly be unsafe if the count itself were exposed. Between two subgraphs that should not be able to communicate, they would still be able to via this globally shared counter.
I see two possibilities:
- Certainly safe: Why do value types need to include unregistered-symbol-named properties? What if we just don't include those?
- Likely unsafe: What if did indeed rank all unregistered symbols created within an agent, by order of creation, so they had a sort order based on their rank. Note that sorting by rank exposes only relative rank. Between two unregistered symbols that Alice has, she has no power to observe the number of unregistered symbols created that she does not have.
The info leak exposed by the second bullet above is at least hard to think about. It is certainly safer than exposing a count. OTOH, it is also introducing global mutable state that is way more observable than anything currently in the language. Before investing any more time in it, why not do the first bullet above?
I propose that values do not allow properties named by unregistered symbols. Those named by registered symbols are sorted separately, and only by the string name of those registered symbols.
My second bulled above, exposing relative rank, is indeed unsafe. It is effectively a side channel, exposing information about the computation of one agent that is not explicitly communicated to another agent. The observing agent, knowing the code of the other, can make inferences about the hidden computation, even if both are denied all abilities to sense duration. It is much like the info leak in gc that lead us to quarantine the ability to make weak refs.
Privileging global symbols over normal ones seems like an antipattern; if we wanted to differentiate those, then non-global symbols should be accepted as WeakMap keys - but by your own agreement with my logic this is something that would be confusing.
Thus, I conclude that we have to allow all kinds of symbols, or disallow them all. Certainly if allowed, it's ideal to sort global symbols first with String(a).localeCompare(String(b))
, and then sort all non-global symbols (by what comparator, remains to be decided).
@ljharb I agree. We must disallow unregistered symbols. Thus, we should disallow all symbols. Allowing registered symbols, though it wouldn't violate any hard safety properties, would cause exactly the confusions you explain.
Not that I disagree with this logic, but just want to point out that from a practical perspective, disallowing all symbols can have some potentially annoying and surprising side-effects for users looking to replace objects with const records. You would for example lose the ability to print them nicely in node since inspect()
relies on the require("util").inspect.custom
symbol. You can of course put one definition for inspect.custom
into the single ConstObject class that all const records share, but not individually differentiate them trivially. Similarly, it would not allow for custom Symbol.iterator
's and so forth.
@erights if registered symbols are ordered by string description, why is there any violation of hard safety properties?
I don't see how it'd work to order unregistered symbols by string description, since multiple unregistered symbols could share a description. You'd need something to disambiguate that case. So I don't see how we could order unregistered symbols without some sort of global ordering of unregistered symbols, which @erights is characterizing as an information leak. (Personally, I'm not totally convinced that this is a meaningful leak, though.)
Could we permit just built-in symbols (e.g., Symbol.iterator
) and the return values of Symbol.for
as keys, and omit all others? I do agree with @tolmasky that it'd be unfortunate to disallow Records to conform to various Symbol-based protocols, and wonder if Symbol.for
would be enough for user-defined Symbol-based protocols (including polyfills and this Node example).
If we are disallow unregistered symbols, then I would lean towards allowing all registered symbols, not just allowing builtins, because allowing only builtins doesn't remove the "antipattern" of allowing some symbols but not all.
Are symbols registered e.g. between workers?
@dcporter you can't post symbols of any kind to a worker, structured clone disallows symbols.
differentiating between kinds of symbols seems problematic to me; i think they should all work or none work (just like in WeakMap keys).
Just wanted to share an outside perspective from a layman.
If order matters then yes, and I agree it'll generate a lot of confusion.
β this comment
This thread is indexing really heavily on the idea that order will wreak havoc on accessibility to your typical JS devs. I personally think it very strange that insertion order is irrelevant for comparisons and is actually completely lost / unretrievable in all operations and uses on/of a record. This seems antithetical to most data structures in JS (according to my perhaps flawed mental model).
Sorting keys for all the things also seems to be adding a lot of complexity.
I suggest taking a step back and taking some measures to verify the premise that it will be prohibitively confusing if insertion order matters. I really think that's a big assumption. And, optimizing for it is has far-reaching implications with lots of trade-offs. So, there might be high ROI on validating the concern. I mean, maybe even asking on Twitter? As lame as that sounds, it might be precisely the intended audience for this change. I don't see a downside to that, honestly.
Keeping insertion order would seem to open a lot of doors, like keeping Symbols as first-class keys, reducing complexity and improving performance (from your comments). (And personally, I would love to see the full spectrum of value types as valid keys which is definitely only remotely possible in the simpler world of "order matters." <- humble plug)
(object with .key = object.key) !== object
Regarding the above, I didn't see any mention of duplicate keys. Apologies if I missed it, but if it's not specified then that ambiguity would probably be the dominant source of confusion.
Lastly, fuckit, how about OrderedRecord
?
Thanks for the great work on this proposal. I'm really excited about it! π
Taking a step back: The reason we're talking about sorting keys is because of this principle: If two JS values are considered equal by Object.is
(i.e., SameValue), then there should be no way to distinguish them; they should have the same behavior when every operation is performed on them. If Reflect.ownKeys
returned two different lists of property keys, in different orders, for two values that are SameValue to each other, it would violate this principle.
We could talk about the motivation for that invariant. I don't think that every single invariant that someone may posit needs to be our top goal, but I think that this one is a good one. It gives us a solid base for a concept of what a value is: two things that are SameValue are the "same object" (even if they might be represented multiple times in memory of an implementation).
Insertion order seems to me to be much less important in immutable structures. It's not even really insertion, it's one-time, create-time definition order. That feels to me to be unimportant β especially so compared to the kind of insane bugs that would crop up because someone reasonably expected #{ x: 1, y: 2 }
, generated in one spot, to equal #{ y: 2, x: 1 }
, generated in another spot (deep in a tertiary dependency etc.).
From a different perspective: as value types, records are similar to strings. It would be terrible if two identical strings that had been arrived at via different concatenations were not equal. "a st" + "ring" === "a stri" + "ng"
.
@dcporter I agree preserving insertion order is not an important goal here. That's why, in this issue, we're talking about sorting the properties instead of preserving insertion order. We need a well-defined order of some kind because of these reasons.
differentiating between kinds of symbols seems problematic to me; i think they should all work or none work (just like in WeakMap keys).
This will be impossible to transpile, much less polyfill, without creating a memory leak, even if assuming all existing stage 1+ proposals are natively implemented.
How about this:
Object.is
: keys must be equal and in order===
: key order is irrelevant for records
This could allow for both styles, because sometimes, you do care, and sometimes, you don't, but usually, you don't. Some precedent for this already exists with numbers, where ===
follows IEEE 754 conventions strictly, but Object.is
only concerns itself with observable identity (independent of bit representation):
+0 === -0
but!Object.is(+0, -0)
NaN !== NaN
butObject.is(NaN, NaN)
@isiahmeadows ability to transpile or polyfill, while nice, is not a blocker or requirement for any proposal.
@isiahmeadows We currently have a small, fixed set of values for which Object.is and === disagree. I'd prefer to not expand that set, as @bakkot has explained elsewhere.
Would it be feasible to allow all symbols but censor unregistered ones without descriptions or with non-unique descriptions within a record when natively iterating from another agent? (Could even allow at most 1 unregistered one without description.) (The censoring could result in omission or a runtime error.)
When similar questions have come up before, the answer has been that no, itβs not feasible to allow some kinds of symbols but not others.
We've heard a decent amount of feedback that allowing registered+known symbols but disallowing unregistered symbols would be very confusing.
The simplest solution IMO is to disallow all symbols as Record keys, which solves the problem of having to order them.
The simplest solution IMO is to disallow all symbols as Record keys, which solves the problem of having to order them.
I agree. There's a natural correspondence between the records and tuples proposal and what people are used to from JSON. Records and tuples includes all of JSON and a bit more: NaN, Infinities, undefined, BitInt. But it is naturally understood as "filling in the holes of what you can express in JSON". By this soft criteria, it is not particularly surprising from it to omit symbols.
But there are use cases for symbols. Given in this very thread. None of them requires comparing unregistered symbols without descriptions or with descriptions non-unique within a record. That's an obscure corner case (but necessary for security), few people would run into it (and those who do would be likely to already expect it when doing weird things on purpose), a modest price for enabling the aforementioned use cases. The semantics is also quite natural: out of order is just a situation whereby @@iterator would have to compare two unregistered symbols without description or with the same description.
@ByteEater-pl IIRC the plan is to make records iterable similar to string-keyed maps and tuples like arrays.
Edit: not TC39, just wanted to clarify.
@ByteEater-pl I agree that there might be specific use cases where the ability to use a symbol as a key in a Record would be helpful, but I doubt that it is a common enough occurrence to offset the additional complexity of allowing some-but-not-all symbols.
From the discussions above it looks like that if we were to disallow symbols as keys in records, then the question of "how to order the keys" is solved (answer: some deterministic order), which then solves the question of "does order matter for equality" (answer: no). If no-one sees any problems with this conclusion then I would love to resolve this point :)
Disallowing symbols as keys seems to me like as good of a solution as we can get in the near term. It could be quite complex to allow certain symbols and not others. Disallowing symbols in the initial version of records and tuples leaves the door open to future proposals to enable them (just as we're eliminating many other things from the scope of this project, e.g., record classes with custom wrappers). So, I'm in favor of the conclusion at #15 (comment)
Would symbol declarations throw errors (Iβd prefer this to silently being ignored). For example:
#{ [someSymbol]: 5 } // throws βCanβt have symbols in recordsβ
@tolmasky good question, I would say yes for two reasons: 1. there isn't really a sane conversion from symbol->string, and 2. disallowing them now allows us to introduce symbol keys later if desired.
Would symbol declarations throw errors
I agree that they should. Spec-wise there's an easy way to make this happen, which is to use ToString rather than ToPropertyKey.
Can someone explain why not supporting symbols is helpful for a stable property order?
If the goal is to make the proposal smaller then it makes sense. Otherwise can't engines just sort symbols based on some internal index or representation? The runtime knows when symbols are equal, the user code doesn't have to care about how. The user would only be exposed to this when JSON.stringify
ing a record which would behave the same as JSON.stringify
ing an object with symbol keys.
Otherwise can't engines just sort symbols based on some internal index or representation?
This itself is the problem with allowing symbols. We would need to define a specific sort order for symbols (it couldn't be, for example, implementation defined, in my opinion). That order would likely need to be based on the "creation order" of the symbols. If symbol property keys in records were sorted on creation order, you would be able derived the creation order of those symbols, which leaks information about the execution of code that creates those symbols (a side channel).
it couldn't be, for example, implementation defined, in my opinion
Eh, why? Why would the user care about the internal sort order if it is never exposed to them? In what way would this be different from objects? The only place the order leaks is JSON.stringify which ignores symbol keys anyway.
Order is also exposed by Object.getOwnPropertySymbols
, Reflect.ownKeys
, etc.
Those would presumably have the same order they do for objects won't they?
In objects they are ordered by property insertion order. But for records, the idea is that if you have let a = Symbol(); let b = Symbol(); let r1 = #{ [a]: 0, [b]: 0 }; let r2 = #{ [b]: 0, [a]: 0 };
, then you would have Object.is(r1, r2) === true
. And that implies that Object.getOwnPropertySymbols(r1)
should be the same as Object.getOwnPropertySymbols(r2)
, which means that the properties returned by getOwnPropertySymbols
cannot be ordered in insertion order. So, no, it can't be the same as for objects (if you grant all of the above assumptions).
(edit: typos)
then you would have Object.is(r1, r2) === true. And that implies that Object.getOwnPropertySymbols(r1) should be the same as Object.getOwnPropertySymbols(r2)
Why? Why would Object.is returning true
imply getOwnPropertySymbols
must return the same value order?
It's because Object.is(a, b)
means "a and b are the same thing". If you pass two times the same thing to getOwnPropertySymbols
, you shouldn't get two different results.
If you pass two times the same thing to getOwnPropertySymbols, you shouldn't get two different results.
Is that intuition or a spec requirement? My intuition is that getOwnPropertySymbols should return all the symbols the object/record has in some order, I would not expect to rely on the order of said symbols. If the spec requires getOwnPropertySymbols to return in some particular order for objects - I understand the objection.
Also, if it's a requirement - can't the spec well.. require it? Can't we say "Symbol keys on records are required to always be iterated in the same implementation defined order such that if Object.is(recordA, recordB) is true then Object.getOwnPropertySymbols(recordA) contains the same elements as Object.getOwnPropertySymbols(recordB) and in the same order" (except in spec language)?
Is that intuition or a spec requirement?
If some functions treat two objects differently, those objects are not the same, and so should not be Object.is
equivalent. This is just an intuition, but a very strong one.
If the spec requires getOwnPropertySymbols to return in some particular order for objects - I understand the objection.
The spec requires getOwnPropertySymbols to return symbols in the order in which they were added to the object.
Can't we say
In general JS has been moving away from leaving things implementation-defined, since in practice people tend to end up relying on a particular implementation and then all engines are forced to go with that strategy forever. We'd want to specify a particular order. But which order? The only obvious one is creation order, which, as mentioned above, is a side channel.
Thanks for bearing with me :] To be clear - I am not objecting one way or another I am just trying to understand. Intuitively it sounds like an unexpected limitation.
which, as mentioned above, is a side channel.
I'm not sure that I understand why that is a side channel or why it is any more of a side-channel than with objects?
The spec requires getOwnPropertySymbols to return symbols in the order in which they were added to the object.
For objects, records don't have to define the same order as objects I think?
This is just an intuition, but a very strong one.
It sounds reasonable when presented this way.
I'm not sure that I understand why that is a side channel or why it is any more of a side-channel than with objects?
My understanding is that it means that passing to another party the two symbols a
and b
, which are not otherwise distinguished, allows that party to determine in which order those symbols were created (by sticking them into a record and iterating its keys). This is a surprising piece of information about the computation which created a
and b
which you probably did not intend to reveal just by handing over those symbols.
But I'll let @erights speak to that further; I don't want to misrepresent his position.
For objects, records don't have to define the same order as objects I think?
Yeah, didn't mean to imply otherwise. In fact strings in records will probably be in lexicographical rather than (as for objects) chronological order. But we do need to pick some order.
It's a bit of information, indeed. Personally, I'm not quite convinced that it's significant/important enough to constitute a meaningful information leak/security issue. I'm happy to ban Symbols as Record keys now, or even commit to that in the future, but I don't understand the scenario where this "leak" would lead to a bad outcome when composing programs.
Maybe, the following. But I doubt if it really harmful.
// code in realm A
Symbol.for(secret_password)
// code in realm B which is running bad code
function guess_password() {
const symb = Symbol.for(Math.random().toString())
for (const guess of possibilities) {
for (const key in #{ [symb]: 0, [Symbol.for(guess)]: 0 }) {
if (key !== symb) {
// which means the symbol @@"guess" is used
// before the symb created.
return key // same value of secret_password
} else {
// which means the symbol @@guess is created
// after the symb created.
// since the order of symbol requires a global
// shared registry, it means this round of
// guess is wrong
}
continue
}
}
}
Maybe, the following. But I doubt if it really harmful.
I am not sure that's really a side-channel since the global symbol registry is well... global (it's whole goal is to leak this sort of info across realms isn't it?).
IIRC now you cannot know the order of symbols creations. If symbol can be generally sorted, this can leaks some information
Even revealing the relative order in which two symbols were created is a concern, because it would help an attacker fingerprint the code theyβre attacking. Any unintentionally revealed information of any kind is a risk.
It sounds like:
- There is an issue about leaking unintentional information in cross-realm symbols. It appears like participants are not sure that's a big concern, but it's a legitimate concern.
- There is a solution by enforcing a consistent ordering on symbol keys (right?) - for example by symbol creation time. This is presumably pretty easy for engines to do. It's hard to spec since "object creation time" is a concept that has no representation in the spec at the moment.
- There is another solution - making the spec require the invariant we care about due to the potential side-channel (symbol keys must always be iterated in the same order in records). However a concern here is that it would leave too much up to the implementer and engines could vary.
I honestly feel that language cohesiveness (objects and records allowing for different types of keys) was given up a bit too quickly. It's just a feeling and not criticism as I am pretty ignorant of the process that happened here and thankful for the discussion.
I think a reasonable path forward could be to write the potential side-channel attack clearly documenting what extra information is exposed across realms and why that is not desirable. Then maybe figure out if solutions #2 (spec and require object creation order or some other consistent order) or #3 (require the invariant without specifying the order) are viable.