bluesky/event-model

Rethinking of external assets

Closed this issue · 3 comments

Notes from a chat with @coretl @tacaswell @callumforrester @DiamondJoseph @tizayi

The motivation here is to improve the situation for using the same detector in fly scans and step scans, with efficient access to data. Lots more to discuss before we implement these disruptive ideas, so nobody panic. :- )


  • Resource and Datum documents will be deprecated (!).
  • As now, the describe() method continues to indicate which keys are backed by external data. (Perhaps we use a new word for this new scheme.)
  • The read() method will simply omit keys that are backed by external data. The RunBundler can verify that non-external keys are present and external keys (as declared by descriptor) are not present.
  • If there are any external keys, RunBundler calls collect_asset_docs(). This should include a Partition document with some parameters that will be used to associcate a slice of underlying data with a (length-1) slice of Events.
  • It should include a Resource2 (gotta name this...) that should have a mimetype and a dict of arbitrary parameters. The only restrtiction on paramters is that they are jSON-serializable.
  • The Partition will also have index_start, index_stop corresponding to a slice in the underlying storage.
handler_class = registry[mimetype]  # i.e. `image/tiff` or `application/x-nexus-something`
handler = handler_class(**parameters)  # e.g. filename and whatever else goes here
# contrast to with datum: handler(**datum_kwargs)
handler[index_start:index_stop]

I sketched out the old and new (with a single event, multiple events would make more use of the slice object)

Stream model suggestions excalidraw(2)

Here's a suggestion as to how it could be implemented.

  • StreamDatum
    • no datum_kwargs
    • no stream_name, instead a pointer to the Descriptor it provides some or all of the data_keys
    • data_keys list to show which data_keys of the Descriptor it provides
    • seq_nums is a slice object showing the Event numbers it corresponds to
    • indexes is a slice object passed to the StreamResource handler so it can hand back data and timestamps
  • Event and EventPage
    • external data_keys will be missing from data and timestamps

The Event side of the diagram then becomes optional, if all detectors write external files then there will be no Event documents

@tcaswell @danielballan shall we begin writing this?

Yes, I think so. I'd like to do this in tandem with @DiamondJoseph's NeXus writers and a sketch of our proposed new storage model for Databroker to validate that everything works out the way we expect.

Can delete stream_names from StreamResource too