microsoft/vscode

Notebook rendering approaches and API

rebornix opened this issue · 9 comments

Last iteration we built a prototype for rendering Notebook in the core and based on the lessons we learnt while dealing with Notebook outputs, we found that Notebook outputs can be arbitrary, it can contain text, streams, images, html and scripts. More over, some of the output types expect them to be in the same browser environment and can access the DOM structure directly. It leads to challenges of rendering outputs in the core as we can't just insert them into the UI. Out current approach is rendering insecure outputs in a web view and lay code cells on top of the web view.

This month, we will set up a test extension which can emit different types of outputs (acting like a mock Notebook kernel) and we hook the extension up with the core. We will use this extension to test output rendering and figure out what's the missing part with current implementation.

The work is happening now in #86632.

Notebook rendering improvement and testing

The rendering approach we take here is rendering secure outputs in the core (like markdown, text, ansi) and insecure ones in webview. The core will sync their positioning and dimensions.

Our major focus is improving the scrolling experience for insecure outputs and exploring how we can support full Jupyter notebook outputs (nteract, ipywidgets) but still keep VS Code core free of any Jupyter Notebook concept.

  • Proposed API for Notebook providers registration and notebook execution
    • Limitations (common scripts loading/injection): we now support outputs with <script> tags.
  • Output rendering (safe outputs rendered in core)
    • Basic elements
      • Text/Stream
      • ANSI output
      • Images
        • PNG
        • Others?
  • Output rendering (insecure ones, rendered in webview)
    • HTML elements
      • static elements
      • script tags
        • We will create script tags and add them to the output container to trigger the browser download and evaluate them.
    • ipywidgets
      • ipywidgets requires widget manager to be created and manages states for both models and views.
      • rendering ipywidgets requires third party libraries but they should be transparent to the core.
      • extensions can create js bundles for ipywidgets and its widget manager and sent them back as outputs
    • nteract: https://components.nteract.io/#section-nteractoutputsmedia
    • Vega/Vega-lite

API exploration

To test different types of notebook outputs, we started the sketch of notebook API. The first prototype was as simple as a language provider, which can provide content for a notebook opened in the editor:

export interface NotebookProvider {
	onDidChangeNotebook?: Event<{ resource: Uri, notebook: INotebook }>;
	resolveNotebook(resource: Uri): Promise<INotebook | undefined>;
	executeNotebook(resource: Uri): Promise<void>;
}

namespace window {
	export function registerNotebookProvider(
		notebookType: string,
		provider: NotebookProvider
	): Disposable;
}

As we noted in #86632, a notebook document consists of code cells and outputs. Every code cell is a text document and will be loaded in Monaco Editor when visible. A notebook document should have following functionalities:

  • store and provide access to code cells (what is a code cell here? string | string[] or vscode.TextDocument?)
  • support saving for code cells
  • update outputs and emit change events
  • support notebook or code cell execution
  • support notebook incremental update

The challenge here is how to describe a code cell: how will notebook provide generate them, how to access their latest content when they are updated in the UI, etc.

Alternative implementation

The major UX challenge with current approach is we are syncing notebook scrolling and notebook cells/outputs sizes between the core UI and the web view. Users can see the lagging in the webview while scrolling the editor. We will continue with current approach and see how far we can go but we also need to look into alternative implementations.


Notes

RequireJS/AMD

Jupyter uses RequireJS to handle JavaScript libraries and the whole ecosystem seems be on top of it (correct me if I'm wrong). With RequireJs or any AMD loader, notebooks can load dependencies dynamically and this is a fundamental infrastructure for interactive outputs, like ipywidgets and Vega.

While implementing the notebook rendering in the core, we still want to keep the core clean and free of Jupyter knowledge, especially how and what Jupyter needs to render outputs. The JavaScript dependencies an output requires should all come from notebook providers.

VS Code has a builtin AMD loader and we can inject this loader to Output rendering environment in advance. Jupyter Notebook providers can take it for granted (assuming that there is always a AMD loader so Notebook providers don't need to inject RequireJS themselves). The heavy lifting work for Jupyter Notebook providers is declaring dynamic dependencies it needs for rendering an interactive output. Thus some output marshaling is required.

Let's use Vega as an example, the outputs we receive from Jupyter Notebook is similar to below

const spec = {
    "$schema": "https://vega.github.io/schema/vega/v5.json", "width": 400, "height": 200, "padding": 5, "data": [{ "name": "table", "values": [{ "category": "A", "amount": 28 }, { "category": "B", "amount": 55 }, { "category": "C", "amount": 43 }, { "category": "D", "amount": 91 }, { "category": "E", "amount": 81 }, { "category": "F", "amount": 53 }, { "category": "G", "amount": 19 }, { "category": "H", "amount": 87 }] }], "signals": [{ "name": "tooltip", "value": {}, "on": [{ "events": "rect:mouseover", "update": "datum" }, { "events": "rect:mouseout", "update": "{}" }] }], "scales": [{ "name": "xscale", "type": "band", "domain": { "data": "table", "field": "category" }, "range": "width", "padding": 0.05, "round": true }, { "name": "yscale", "domain": { "data": "table", "field": "amount" }, "nice": true, "range": "height" }], "axes": [{ "orient": "bottom", "scale": "xscale" }, { "orient": "left", "scale": "yscale" }], "marks": [{ "type": "rect", "from": { "data": "table" }, "encode": { "enter": { "x": { "scale": "xscale", "field": "category" }, "width": { "scale": "xscale", "band": 1 }, "y": { "scale": "yscale", "field": "amount" }, "y2": { "scale": "yscale", "value": 0 } }, "update": { "fill": { "value": "steelblue" } }, "hover": { "fill": { "value": "red" } } } }, { "type": "text", "encode": { "enter": { "align": { "value": "center" }, "baseline": { "value": "bottom" }, "fill": { "value": "#333" } }, "update": { "x": { "scale": "xscale", "signal": "tooltip.category", "band": 0.5 }, "y": { "scale": "yscale", "signal": "tooltip.amount", "offset": -2 }, "text": { "signal": "tooltip.amount" }, "fillOpacity": [{ "test": "datum === tooltip", "value": 0 }, { "value": 1 }] } } }]
};
const opt = {};
const type = "vega";
const id = "2a522180-bd9f-476c-b4f0-5e6311bccbc3";

const output_area = this;

require(["nbextensions/jupyter-vega/index"], function (vega) {
    const target = document.createElement("div");
    target.id = id;
    target.className = "vega-embed";

    const style = document.createElement("style");
    style.textContent = [".vega-embed .error p {", "  color: firebrick;", "  font-size: 14px;", "}", ].join("\\\\n");

    // element is a jQuery wrapped DOM element inside the output area
    // see http://ipython.readthedocs.io/en/stable/api/generated/\\
    // IPython.display.html#IPython.display.Javascript.__init__
    element[0].appendChild(target);
    element[0].appendChild(style);

    vega.render("#" + id, spec, type, opt, output_area);
});

As you can see above, the output takes following assumptions:

  1. AMD loader knows where to fetch nbextensions/jupyter-vega/index
  2. element is a jQuery wrapped DOM element inside the output area
  3. this contains output information

If VS Code core insert this script directly into the DOM tree, it won't execute successfully as the core is not aware of above assumptions. While the notebook provider can wrap it with all required information:

require.config({
    paths: {
        "nbextensions/jupyter-vega/index": "vscode-resource://file///Users/penlv/code/vscode/extensions/notebook-test/dist/jupyter-vega/index.js"
    }
});

(function(element) {
    const spec =
    ...

    const output_area = this;
    require(["nbextensions/jupyter-vega/index"], function (vega) {
       ....
    }
}).call({ outputs: ["#vegatest"] }, [document.getElementById("vegatest")]);

It would be great if you could also investigate vega and vega-lite output types. Both JupyterLab and nteract support the vega and vega-lite MIME output types out-of-the box. CC @domoritz from the vega team.

Vega and Vega-Lite work if you use https://github.com/microsoft/vscode-python for Python notebooks. I think you could use the same approach (using nteract components) here.

Thanks for the update @rebornix
A few questions, might be too early, but thought I'd ask:

  • Is it possible to have multiple notebook providers?
  • Can an extension author inspect contents of a notebook that's being displayed?
    • Today one can create extensions for .js, .md files easily by accessing the TextDocument. Would a similar extensibility be available to notebook contents/documents?
  • Ties with previous question, can we have multiple extensions that contribute to the notebook UI

@davidanthoff @domoritz thanks for the pointers, I'll take Vega into account while testing.

@DonJayamanne thanks for the good questions, we don't know all the answers yet but here is what we thought

Is it possible to have multiple notebook providers?

Yes, an extension can register multiple notebook providers, but for each resource, there should be one active provider. For example, an extension can register one provider for .ipynb files and another one for .test.ipynb files (see samples https://github.com/microsoft/vscode/blob/rebornix/notebook/extensions/notebook-test/package.json#L31-L32 , please note nothing is finalized yet so the sample just presents the ideas).

Can an extension author inspect contents of a notebook that's being displayed?

The extension which contributes the notebook provider should be able to access that. Like you mentioned, it would be somewhat similar to TextDocument. Extensions will be able to read the text documents we created for code cells and then it can read the live content when users attempt to run a code cell.

can we have multiple extensions that contribute to the notebook UI

Can you elaborate more on this one? Are you saying multiple extensions contributing to the same notebook file/editor?

Can an extension author inspect contents of a notebook that's being displayed?

Does this mean we can have two separate extensions one that contribute a notebook provider, and another that doesn't, but only wishes to read the contents of an active notebook editor (like TextDocument).

can we have multiple extensions that contribute to the notebook UI

Yes. Can we have other extensions add adornments or behaviors to the UI (but not provide any functionality for execution of cells, and the like).
Eg. can a separate extension contribute an adornment to collapse or hide a cell?

Thanks for answering the questions.

Does this mean we can have two separate extensions one that contribute a notebook provider, and another that doesn't, but only wishes to read the contents of an active notebook editor (like TextDocument).

I don't have concrete answer for this one. If there is no obvious security or any other concern, I think the notebook content can be public to all extensions.

Yes. Can we have other extensions add adornments or behaviors to the UI (but not provide any functionality for execution of cells, and the like).
Eg. can a separate extension contribute an adornment to collapse or hide a cell?

We can probably support but not sure how/what yet. It might be similar to a normal Monaco Editor that extensions can contribute Editor Actions, Keybindings, or Decorations. What and How is not clear yet at this moment.

@rebornix One more item.
The jupyter notebook and other notebook providers have the ability to hide the editor but display just the output cells. Hope you can accommodate this in your design. Today this is done via 3rd party extensions (quite popular - to hide code and just look at the results). Its one approach to building presentations...

Other variations include hiding editor cells selectively, i.e. not all.
Thanks again

blois commented

I'm a maintainer of Google Colaboratory's output isolation feature and am a strong proponent of isolating notebook outputs- very happy to see this work being explored (the current notebook editor's approach seems a bit risky, can expand on this offline if desired but I suspect the issues are known).

A few things-

  • We isolate outputs which are loaded with a notebook from outputs which are created during the current execution session and give more privileges to outputs from the current session. We intentionally only provide limited communication channels back to the kernel but then prevent all communication for outputs which are not from the current session.
  • It sounds like the approach you're investigating is similar to how https://observablehq.com/ operates- I found this approach got complicated when we had to have two categories of outputs.
  • Colab uses an iframe-per-cell which aids the isolation but does behave a bit differently than the rest of the notebook editors. For the most part I believe there have been minimal tweaks needed to get things working, some things such as third-party widgets will require a bit more. An example of plotly, bokeh and widgets: https://colab.sandbox.google.com/gist/blois/26afff508034bd82d1e659578e5e1dc0/outputs.ipynb
  • Performance of iframe creation is definitely a concern, see my comment here: https://groups.google.com/d/msg/jupyter/jnsXG-RmbE0/CCU3ZR_rAAAJ.

The purpose this issue (exploring rendering approaches and first cut of API) was archived so we are going to close it and track our work in new issues. Thanks everyone for your contribution to this topic!