datalayer/jupyter-server-nbmodel

Add a pre- and post-execution hooks

Opened this issue · 5 comments

Problem

  • When the notebook is closed the metadata for cell timing are not updated
  • When the notebook is closed and execution of a long-running cell completes, there is no way for extensions to offer notifications (e.g. sending an email)

Proposed Solution

Allow extensions to register pre- and post-execution functions to run which would be allowed to populate the cell metadata.

Maybe special-case execution timing as a hook that is available by default, and enabled based on an argument passed to the JSON API?

Additional context

The timing metadata population is implemented in JupyterLab here:

https://github.com/jupyterlab/jupyterlab/blob/ef5a055976e02fee6c4d877a6a6df756f3323f79/packages/cells/src/widget.ts#L1736-L1787

I see there is a TODO note in this repository about it:

// FIXME quid of deletedCells and timing record

To give some more context here:

  • the pre-post execution hooks should be implemented in jupyter-server extension to allow to perform a side-effect even if the browser is not connected (for frontend there are signals already)
  • there is a couple of options for how to enable the registration of hooks
  • timing could be implemented either using such hooks, or directly in _execute_snippet by measuring time before and after client.execute_interactive call:
    async def _execute_snippet(
    client: jupyter_client.asynchronous.client.AsyncKernelClient,
    ydoc: jupyter_server_ydoc.app.YDocExtension | None,
    snippet: str,
    metadata: dict | None,
    stdin_hook: t.Callable[[dict], None] | None,
    ) -> dict[str, t.Any]:
    """Snippet executor
    Args:
    client: Kernel client
    ydoc: Jupyter server YDoc extension
    snippet: The code snippet to execute
    metadata: The code snippet metadata; e.g. to define the snippet context
    stdin_hook: The stdin message callback
    Returns:
    The execution status and outputs.
    """
    ycell = None
    if metadata is not None:
    ycell = await _get_ycell(ydoc, metadata)
    if ycell is not None:
    # Reset cell
    with ycell.doc.transaction():
    del ycell["outputs"][:]
    ycell["execution_count"] = None
    ycell["execution_state"] = "running"
    outputs = []
    # FIXME we don't check if the session is consistent (aka the kernel is linked to the document)
    # - should we?
    reply = await ensure_async(
    client.execute_interactive(
    snippet,
    # FIXME stream partial results
    output_hook=partial(_output_hook, outputs, ycell),
    stdin_hook=stdin_hook if client.allow_stdin else None,
    )
    )
    reply_content = reply["content"]
    if ycell is not None:
    with ycell.doc.transaction():
    ycell["execution_count"] = reply_content.get("execution_count")
    ycell["execution_state"] = "idle"
  • special-casing execution timing would be desirable for two reasons:
    • if another pre/post execution hook is added which takes non-negligible time to execute, it could be omitted from timing if we special case it (otherwise the timing results could depend on order of hook registration which I think would be confusing)
    • we can add an argument to the REST API to enable it, improving the feature parity with the default code execution mechanism if the runCell in typescript automatically passes this argument if user enabled the recordTiming option.

@fcollonval @echarles any preferences on how we should implement hooks? On frontend these are just ISignals. Let me number the options I listed above:

My own thoughs:

  • (a) is convenient to use once you know how entry points work:
    • requires writing a python package
    • the hook implementation can be a pure Python function
    • possibly slightly harder to test on CI
  • (b) seems easier from maintenance point as this is just an extra method on extension app but documented as public:
    • requires writing a server extension to get access to this extension's API
    • thus this also requires creating a python package
  • (c) would be in harmony with how jupyter-server does hooks:
    • can be used without requiring a python package,
    • could use a Set or List traitlet to allow registering multiple hooks

Thanks for raising the question @krassowski

Small question related to the need and setting aside the specific case of timing, would using the Jupyter Server events answer the need or do you see other cases like the timing that the reaction should be called immediately?

Great point! For the use case of "send me an email when execution finished", I think Jupyter Server event could be good enough.

For feature parity: frontend has onCellExecuted, selectionExecuted, onCellExecutionScheduled signals which can perform side effects on the cell or notebook, but cannot modify the code being executed. Jupyter Server event needs to be serialized to JSON, so these could not include the equivalent cell or notebook object, but I think it would be fine if the event contained:

  • cellId
  • documentId
  • success
  • kernelError (if any)

It could be interesting to explore pre-execution hooks that would allow to modify the code but I think this is a larger conversation and not required right now.

Shall we just add event emitter then?

Shall we just add event emitter then?

I would go that road as it is anyway a good addition. But let's keep this issue opened for further exploration.