Should `JLSOFile` have `get`, `get!`, `keys` and `haskey` methods
Closed this issue · 6 comments
A JLSOFile
has both metadata as well as the serialised objects as a Dict{Symbol, Vector{UInt8}}
But you can interact with a jlso::JLSOFile
using getindex
and setindex!
like jlso[:x]
and jlso[:y] = 1
, which makes it tempting to interact with the jlso
like a Dict
which serialises/deserialises the objects to/from disk e.g.
jlso = read(file, JLSOFile)
y_data = get!(jlso, :y, 1)
I'm not exactly sure if we want a JLSOFile
should be exactly Dict-like (or even what exactly that means JuliaLang/julia#25941). But i think get
, get!
, keys
and haskey
methods make sense - what do you think? Possibly also iterate
, but less sure about that one.
Something like
Base.keys(jlso::JLSOFile) = keys(jlso.objects)
Base.haskey(jlso::JLSOFile, key) = haskey(jlso.objects, key)
Base.get(jlso::JLSOFile, key, default) = haskey(jlso, key) ? jlso[key] : default
Base.get!(jlso::JLSOFile, key, default) = get!(() -> default, jlso, key)
function Base.get!(func, jlso::JLSOFile, key)
return if haskey(jlso, key)
jlso[key]
else
jlso[key] = func()
end
end
and if we want iterate
then i guess just delegate that too?
Base.iterate(jlso::JLSOFile) = iterate(jlso.objects)
Base.iterate(jlso::JLSOFile, state) = iterate(jlso.objects, state)
Would a PR to define these be welcome?
Yeah, I don't think we'll want to subtype AbstractDict
or anything, but if you have a use-case for those wrapper methods then a PR would be welcome.
The pattern i'm looking to support is something like
if isfile(filename)
jlso = read(filename, JLSOFile)
return get!(jlso, :qux) do # read from JLSO file
result = parse("qux.csv") # or compute and store in JLSO file
end
else
result = parse("qux.csv")
save(filename, :qux, result)
return result
end
Although i'm not sure if this is a good pattern.
Partly because it's using JLSO as a "stable" file format that i can use as an on-disk cache. I don't need to serialised data to be readable in years. But i'm trusting that if I have a JLSO file created by a prior computation i can safely add new objects tot he same file.
FWIW, that might be better handled in your application logic with something like this?
if isfile(filename)
jlso = read(filename, JLSOFile)
haskey(jlso, :qux) || return parse("qux.csv") # Load qux.csv if not found
result = jlso[:qux]
result isa Vector{UInt8} && return parse("qux.csv") # Load qux.csv if not loadable
return result
else
result = parse("qux.csv")
save(filename, :qux, result)
return result
end
that'd still require defining haskey(jlso, :qux)
, right? Also that doesn't add the result to the JLSOFile, does it?
Handling the result isa Vector{UInt8}
case is a good point, although i don't think we've run into any problems with that yet, afaik
i think i'll open a PR for these. Because otherwise i'm a bit tempted to work with jlso_objects = JLSO.lead(filename)
which will eagerly deserialise things, or else work with jlso.objects
which feels less nice than having the methods automatically delegate to that where appropriate
that'd still require defining haskey(jlso, :qux), right?
Correct, you'd still want haskey
define. I was more just thinking that it could be a way of addressing potentially incompatible serialized objects for your caching purposes.
Also that doesn't add the result to the JLSOFile, does it?
Right, you could always replace parse
with some inner function that contains the lines in your else block and reuse it for each condition.