What does it mean to have a Platform as a key in a Dict
Closed this issue · 8 comments
The part here:
JLLWrappers.jl/src/toplevel_generators.jl
Line 125 in bc3b6af
uses a Platform
as a key in a dictionary. But a Platform
is an object that AFAIU will use objectid
as part of its hash computation, so for example:
julia> valid_wrappers = Dict{Platform,String}()
Dict{Platform, String}()
julia> p = Platform("x86_64", "windows"; cuda = "10.1");
julia> q = Platform("x86_64", "windows"; cuda = "10.1");
julia> valid_wrappers[p] = "foo"
"foo"
julia> valid_wrappers[p2] = "bar"
"bar"
julia> valid_wrappers
Dict{Platform, String} with 2 entries:
Platform(Dict("arch"=>"x86_64", "os"=>"windows", "cuda"=>"10.1"), Dict{String, Function}()) => "foo"
Platform(Dict("arch"=>"x86_64", "os"=>"windows", "cuda"=>"10.1"), Dict{String, Function}()) => "bar"
It's also a bit unclear to me what happens when you serialize such a dict. Is it still valid when you read it back? Would an IdDict
be more suitable here to emphasize that the key is based on the constructed object itself and not the content?
I'm not sure I follow. Assuming that p2
in your example is q
, I get this:
julia> using Base.BinaryPlatforms
julia> valid_wrappers = Dict{Platform,String}()
Dict{Platform, String}()
julia> p = Platform("x86_64", "windows"; cuda = "10.1");
julia> q = Platform("x86_64", "windows"; cuda = "10.1");
julia> valid_wrappers[p] = "foo"
"foo"
julia> valid_wrappers
Dict{Platform, String} with 1 entry:
Platform(Dict("arch"=>"x86_64", "os"=>"windows", "cuda"=>"10.1"), Dict{String, Function}()) => "foo"
julia> valid_wrappers[q] = "bar"
"bar"
julia> valid_wrappers
Dict{Platform, String} with 1 entry:
Platform(Dict("arch"=>"x86_64", "os"=>"windows", "cuda"=>"10.1"), Dict{String, Function}()) => "bar"
It's also a bit unclear to me what happens when you serialize such a dict. Is it still valid when you read it back?
This is in a let
, so it isn't serialised to the ji
file, is it? Or are you referring to something else?
Yeah, I see the issue. On the latest master
, p
and q
don't hash()
to the same value, so they occupy separate buckets. The reason this isn't a problem for JLL wrappers so far is that we use the actual same objects later when we index into the dictionary, but this is fragile, and we shouldn't rely upon it.
We may want instead to key the dictionary by triplet(p)
instead of p
directly, as the triplet represents the canonical serialization of the Platform
object.
Note that while the serialization captures most things it doesn't capture comparison strategies, but that's fine, since we don't care about those here at all.
On the latest master, p and q don't hash() to the same value, so they occupy separate buckets. The reason this isn't a problem for JLL wrappers so far is that we use the actual same objects later when we index into the dictionary, but this is fragile, and we shouldn't rely upon it.
Yeah, that was pretty much my point. It seems good to either be explicit that the id of the object matters (by using an IdDict
) or define hash functions. But using objects with mutable internal state is also a bit fragile when using them as keys because the hash gets "cached" in the dictionary when it is inserted. But if a Platform
is always created completely up front and then never mutated I think it should be fine?
Ah, I didn't think of defining hash()
. I'll open a PR to define a stable hash(::Platform)
function defined as hash(hash(a.tags), hash(a.compare_strategies))
Can this be closed since JuliaLang/julia#37734 was merged?
I guess, imo, it should still maybe be an IdDict
because we are still talking about the identity of the object and not the behavior since changing out the comparison strategy with x -> old_comparison_strategy(x)
will give a new hash. But it's better now.