DiamondLightSource/workflows

Triggering authenticated workflows

DiamondJoseph opened this issue · 2 comments

Assuming:

  • There is a service called blueapi, which requires an authenticated request and creates "raw data"
  • There is a process called analysis, which generically consumes "raw data" and creates "processed data"
  • A user makes a request to blueapi to create "raw data" and knows they want a specific form of analysis to produce "processed data" either while blueapi is acting or afterwards.
  • To leverage the workflow system, the user should not need to manually create the analysis instance
  • The analysis instance should write data to the same visit as the raw data and request that spawned it
  • The analysis should be authorized to read only the raw data that it requires
sequenceDiagram
    actor Alice
    Note left of Alice: my_scan uses my_analysis
    Alice ->> +blueapi: run my_scan, visit=a1
    Note over Alice,blueapi: scope read data visit=a1
    Note over Alice,blueapi: scope write data visit=a1
    Note over Alice,blueapi: scope run my_analysis visit=a1

    participant raw as Raw Data Store<br>[via DataAPI]
    blueapi ->> raw: StartDocument runid=a1-1
    Note over blueapi,raw: AuthZ'd to write

    participant manager as Workflow Manager
    blueapi ->> manager: start my_analysis visit=a1 runid=a1-1
    Note over blueapi,manager: AuthZ'd to run
    
    create participant Analysis as my_analysis
    manager ->> +Analysis: creates
    Note over manager,Analysis: scope read data visit=a1
    Note over manager,Analysis: scope write data visit=a1
    
    opt Live Analysis
    Analysis ->> raw: fetch data so far
    raw ->> Analysis: 
    Note over Analysis,raw: AuthZ'd to read
    Analysis ->> processed: processed data
    Note over Analysis,processed: AuthZ'd to write

    loop until scan over
    blueapi ->> raw: Documents
    Analysis -->> raw: poll for new data
    Analysis ->> processed: processed data
    end
    blueapi ->> raw: StopDocument
    Analysis -->> raw: poll for new data
    Analysis ->> processed: processed data
    end
    opt Post Processing
    blueapi ->> raw: Documents
    blueapi ->> -raw: StopDocument
    Analysis ->> raw: fetch all data
    raw ->> Analysis:     
    Note over raw,Analysis: AuthZ'd to read
    end
    deactivate Analysis

    participant processed as Processed Data Store<br>[via DataAPI]
    destroy Analysis
    Analysis ->> processed: processed data
    Note over Analysis,processed: AuthZ'd to write

    Alice ->> raw: 
    raw ->> Alice: 
    Note over Alice,raw: AuthZ'd to read
    Alice ->> processed: 
    processed ->> Alice: 
    Note over Alice,processed: AuthZ'd to read
Loading

@callumforrester has thoughts about whether the live/at-rest processing should look the same or not:

DISCLAIMER: I'm not in data analysis and my knowledge may be out of date

this looks the same regardless of if it's post or live analysis.

Not quite, for post processing the code can and should be considerably simpler, no need to go through the data as if it is being streamed when it isn't, you want the code to say something like:

detector_data = data_api.get("saxs")[:]
return np.average(detector_data, axis=0)

This is especially important because (I believe) that most of our use cases are still for post processing rather than live processing, so we shouldn't introduce unnecessary complexity to the majority use case.

See bluesky/tiled#437 for Tiled implementation of the DataAPI informing the client that more information is available for consumption