Record representation is unstable
Opened this issue · 9 comments
Since we added support for optional fields on records, a crucial assumption was broken: up until then, records can be expected to have a fixed layout, as its definition.
So records with the same values for the same keys always had the same hash.
However, by introducing optional fields, we can easily create values that pass the equality check but have different hashes.
type t = {
a: string,
b?: string,
c: string,
}
let v1 = {
a: "a",
c: "c",
}
let v2 = {
...v1,
b: "b",
}
let v3 = {
a: "a",
b: "b",
c: "c"
}
assert (Hashtbl.hash(v2) == Hashtbl.hash(v3))
v2
and v3
have the same values for the same keys but in different order.
Obj.entries(v2)
=>[['a', 'a'], ['c', 'c'], ['b', 'b']]
Obj.entries(v3)
=>[['a', 'a'], ['b', 'b'], ['c', 'c']]
This literally means that users can't use a record as a key in a hashtable or other data structure. In other words, it's not a record.
One possible fix is to add entry sorting at runtime, which will add serious performance degradation wherever optional fields are used (e.g. JSXv4)
Another way is to initialize optional fields to None
, but this will again change the JS semantic.
Coercion for records also break this assumption (more fields can be on the object than what's represented by the record).
Honestly I think these are just quirks we'll have to live with in some form if we want optional record fields (which we do) and coercion (which we also do).
Semi-unrelated, but one thing I've thought about before is whether one could have an annotation for a record definition type that disallows optional fields and/or coercion. Or one annotation for each thing. That way you could set up a "protected" record that you know will have the exact runtime shape you expect (unless it's from an external, but 🤷 ).
Coercion for records also break this assumption (more fields can be on the object than what's represented by the record).
Honestly I think these are just quirks we'll have to live with in some form if we want optional record fields (which we do) and coercion (which we also do).
I think we should mention this caveat in the docs for optional fields. I rarely use record as the key (or a complex type), but who knows.
Actually that was never true because of FFI.
Coercion for records also break this assumption (more fields can be on the object than what's represented by the record).
IMO it should have runtime effects, not just be a simple type assertion.
type a = {
foo: string,
bar: string,
}
type b = {
bar: string,
}
a :> b
Output should be { bar: a.bar }
rather than a
. It also makes more sense to extend it to custom encoding/decoding later.
I know it has a tradeoff, but we need to care a little more for integrity, even if it needs runtime.
Actually that was never true because of FFI.
At least when using FFI explicitly, users can be quite careful about it. But in pure ReScript-produced code, this behavior is somewhat surprising to me.
Honestly I think these are just quirks we'll have to live with in some form if we want optional record fields (which we do) and coercion (which we also do).
If so, we should deprecate supporting comparison operations (==
, >
, <
, >=
, <=
) on records. It's a general property when dealing with immutable data, but if we can't fully support it, it's just an unreliable footgun.
If so, we should deprecate supporting comparison operations (
==
,>
,<
,>=
,<=
) on records. It's a general property when dealing with immutable data, but if we can't fully support it, it's just an unreliable footgun.
This is definitely a good idea. I wonder however if anyone is using them. That type of comparison could be quite powerful (==
for records anyway) but I wonder whether it actually works as expected today?
This is definitely a good idea. I wonder however if anyone is using them. That type of comparison could be quite powerful (== for records anyway) but I wonder whether it actually works as expected today?
OCaml std users may be using it, but there's no guarantee that it will work without bugs.
We're dropping OCaml std, and Belt is a little better because it makes users explicitly build their own Hashable
and Comparable
modules.
But still, using Hashtbl.hash
or comparison operators as its implementation is a common usage pattern that as far as I know.
However, the advantage of module functions is that it makes it easy to replace the implementation. Users can get both correctness and performance by replacing in a custom implementation.
Then we could extend language support something like @deriving
to support deriving hash and comparison, like Rust's derive macro, or Java's Lombok does.