Rethinking of external assets
Closed this issue · 3 comments
Notes from a chat with @coretl @tacaswell @callumforrester @DiamondJoseph @tizayi
The motivation here is to improve the situation for using the same detector in fly scans and step scans, with efficient access to data. Lots more to discuss before we implement these disruptive ideas, so nobody panic. :- )
- Resource and Datum documents will be deprecated (!).
- As now, the
describe()
method continues to indicate which keys are backed by external data. (Perhaps we use a new word for this new scheme.) - The
read()
method will simply omit keys that are backed by external data. The RunBundler can verify that non-external keys are present and external keys (as declared by descriptor) are not present. - If there are any external keys, RunBundler calls
collect_asset_docs()
. This should include aPartition
document with some parameters that will be used to associcate a slice of underlying data with a (length-1) slice of Events. - It should include a
Resource2
(gotta name this...) that should have a mimetype and a dict of arbitraryparameters
. The only restrtiction on paramters is that they are jSON-serializable. - The
Partition
will also haveindex_start
,index_stop
corresponding to a slice in the underlying storage.
handler_class = registry[mimetype] # i.e. `image/tiff` or `application/x-nexus-something`
handler = handler_class(**parameters) # e.g. filename and whatever else goes here
# contrast to with datum: handler(**datum_kwargs)
handler[index_start:index_stop]
I sketched out the old and new (with a single event, multiple events would make more use of the slice object)
Here's a suggestion as to how it could be implemented.
- StreamDatum
- no
datum_kwargs
- no
stream_name
, instead a pointer to the Descriptor it provides some or all of the data_keys data_keys
list to show which data_keys of the Descriptor it providesseq_nums
is aslice
object showing the Event numbers it corresponds toindexes
is aslice
object passed to the StreamResource handler so it can hand back data and timestamps
- no
- Event and EventPage
- external
data_keys
will be missing fromdata
andtimestamps
- external
The Event side of the diagram then becomes optional, if all detectors write external files then there will be no Event documents
@tcaswell @danielballan shall we begin writing this?
Yes, I think so. I'd like to do this in tandem with @DiamondJoseph's NeXus writers and a sketch of our proposed new storage model for Databroker to validate that everything works out the way we expect.
Can delete stream_names from StreamResource too