JuliaPackaging/JLLWrappers.jl

What does it mean to have a Platform as a key in a Dict

Closed this issue · 8 comments

The part here:

valid_wrappers = Dict{Platform,String}()

uses a Platform as a key in a dictionary. But a Platform is an object that AFAIU will use objectid as part of its hash computation, so for example:

julia> valid_wrappers = Dict{Platform,String}()
Dict{Platform, String}()

julia> p = Platform("x86_64", "windows"; cuda = "10.1");

julia> q = Platform("x86_64", "windows"; cuda = "10.1");

julia> valid_wrappers[p] = "foo"
"foo"

julia> valid_wrappers[p2] = "bar"
"bar"

julia> valid_wrappers
Dict{Platform, String} with 2 entries:
  Platform(Dict("arch"=>"x86_64", "os"=>"windows", "cuda"=>"10.1"), Dict{String, Function}()) => "foo"
  Platform(Dict("arch"=>"x86_64", "os"=>"windows", "cuda"=>"10.1"), Dict{String, Function}()) => "bar"

It's also a bit unclear to me what happens when you serialize such a dict. Is it still valid when you read it back? Would an IdDict be more suitable here to emphasize that the key is based on the constructed object itself and not the content?

I'm not sure I follow. Assuming that p2 in your example is q, I get this:

julia> using Base.BinaryPlatforms

julia> valid_wrappers = Dict{Platform,String}()
Dict{Platform, String}()

julia> p = Platform("x86_64", "windows"; cuda = "10.1");

julia> q = Platform("x86_64", "windows"; cuda = "10.1");

julia> valid_wrappers[p] = "foo"
"foo"

julia> valid_wrappers
Dict{Platform, String} with 1 entry:
  Platform(Dict("arch"=>"x86_64", "os"=>"windows", "cuda"=>"10.1"), Dict{String, Function}()) => "foo"

julia> valid_wrappers[q] = "bar"
"bar"

julia> valid_wrappers
Dict{Platform, String} with 1 entry:
  Platform(Dict("arch"=>"x86_64", "os"=>"windows", "cuda"=>"10.1"), Dict{String, Function}()) => "bar"

It's also a bit unclear to me what happens when you serialize such a dict. Is it still valid when you read it back?

This is in a let, so it isn't serialised to the ji file, is it? Or are you referring to something else?

Yeah, I see the issue. On the latest master, p and q don't hash() to the same value, so they occupy separate buckets. The reason this isn't a problem for JLL wrappers so far is that we use the actual same objects later when we index into the dictionary, but this is fragile, and we shouldn't rely upon it.

We may want instead to key the dictionary by triplet(p) instead of p directly, as the triplet represents the canonical serialization of the Platform object.

Note that while the serialization captures most things it doesn't capture comparison strategies, but that's fine, since we don't care about those here at all.

On the latest master, p and q don't hash() to the same value, so they occupy separate buckets. The reason this isn't a problem for JLL wrappers so far is that we use the actual same objects later when we index into the dictionary, but this is fragile, and we shouldn't rely upon it.

Yeah, that was pretty much my point. It seems good to either be explicit that the id of the object matters (by using an IdDict) or define hash functions. But using objects with mutable internal state is also a bit fragile when using them as keys because the hash gets "cached" in the dictionary when it is inserted. But if a Platform is always created completely up front and then never mutated I think it should be fine?

Ah, I didn't think of defining hash(). I'll open a PR to define a stable hash(::Platform) function defined as hash(hash(a.tags), hash(a.compare_strategies))

Can this be closed since JuliaLang/julia#37734 was merged?

I guess, imo, it should still maybe be an IdDict because we are still talking about the identity of the object and not the behavior since changing out the comparison strategy with x -> old_comparison_strategy(x) will give a new hash. But it's better now.