gopherdata/gophernotes

Need more integrated HTML support

kesmit13 opened this issue · 23 comments

Currently, the only way that I see to display HTML in a notebook is to use display.HTML(obj). It would be nice if there something like an HTMLer interface that would allow the HTML representation of an object to be displayed automatically instead of the text version. Something like the following:

type HTMLer interface {
    HTML() string
}

The original proposal for display by @SpencerPark was quite similar to yours - it defined many interfaces, one per media type (HTML, JSON, PNG, PDF...).

It certainly has its appeal, but in my opinion it also has a drawback: it requires creating types that implement such interfaces. So in simple cases I felt it would be more cumbersome to use, which spawned the discussion in #105 and led to the current solution: the functions display.HTML(), display.PNG() etc. that return a display.Data, which in turn is handled specially by gophernotes.

Now that the simple cases are already handled by the current solution, it may be a good time to re-evaluate a possible additional solution for more complex needs, and check whether passing around instances of display.Data suffices, or it's better to do something else - for example via interfaces, as you propose.

For sure, an additional mechanism to display multimedia data would need to cover most (or better, all) currently supported types, but starting with one can be a stepping stone.

For example, how would you propose to extend your idea to other types?
Something like

type JSONer interface {
  JSON() string
}
type PDFer interface {
  PDF() []byte
}
type PNGer interface {
  PNG() []byte
}
// ...

or something else?
To be fully honest, such interface do not seem very appealing to me, although I cannot easily explain why - maybe they look a bit "artificial" ?

I guess I don't see this as any different than the _repr_html_, _repr_javascript_, _repr_latex_, etc. methods in the Python kernel. As a long time user of the Python version of Jupyter, I found it a bit odd that there wasn't a way of automatically having objects rendered in the browser, and having to manually call a display method.

I see it all acting like the Python kernel version. If an object supported the interface, it would call it, plug that output into the appropriate mime type slot in the output data structure, and let the client decide which one it wants to use.

It does get a little tricky in the cases in render where it accepts multiple objects since some may support an interface and others may not. That's not too difficult to handle in the HTML case, as I did in my pull request, but something like PDF or PNG might not work so well. But I'm not sure when multiple objects get passed into the render function.

I am not an expert of Jupyter with the Python kernel - actually I never used it with Python - so if this brings more similarity to well-known Python API, it may be a good idea.

Currently, gophernotes attempts to display results graphically only if one of them is non-nil, so it can never have problems with multiple objects to display graphically at the same time. This is a design choice that @SpencerPark advocated strongly for uniformity with Jupyter+Python, which he knows quite well.

Yes this is a tricky situation because we want to fit Go comfortably into Jupyter but Jupyter (and the packages around it) are very much biased by the original implementation which was IPython.

There is a lot of overlap between what people think Jupyter does and what the python kernel is doing and so they expect it to work in other kernels as well but it doesn't always make as much sense in Go as it did in Python. IPython.display module being a very popular one with the common interfaces outlined in IPython.display.display (the _repr_html_ etc that @kesmit13 mentioned). There are also functions like display_html which more closely mirror what we have now in gophernotes but I don't find their use as common.

I don't know Go as well so the idiomatic way of implementing such a feature is not something I can argue strongly for but interfaces seem to make sense to me. I remember hitting a snag with gomacro a while ago with interpreted interfaces which was something that stopped progress on this approach.

I'll link my old attempt an an interface based implementation if it is any help. runtime.go

@cosmos72 Could a user implement one of these interfaces (like HTMLer) on an interpreted type they create in a notebook?

I believe @SpencerPark is correct that implementing methods on interpreted types does not work in gomacro, but if interfaces were supported, people could at least implement them in their own packages (which is what I'm trying to do).

Thanks for joining the thread @SpencerPark :)

Actually, gomacro "fast" interpreter supports declaring methods on interpreted types, supports declaring interpreted interfaces, and both compiled/interpreted types can implement both compiled/interpreted interfaces (wow, that's a mouthful).

The "classic" interpreter instead (which gophernotes used in the past) does not support interpreted interfaces.

Having said that, I had to use a lot of tricks to implement such features, and they are actually emulated: they work quite well inside the interpreter (there are some unimplemented corner cases), but extracting a value of an interpreted type from the interpreter and using it in compiled code reveals the differences: the type is actually unnamed, it has no methods and thus cannot implement any interface (interpreted or compiled). To retrieve the methods you must ask them to the interpreter. Same to convert it to an interface. This cannot be fixed, because gomacro uses the reflect package to create types, which does not support creating new named types and new interface types.

One of the unimplemented corner cases is important in this discussion: if you convert a value of an interpreted type to interface{}, you no longer have access to its full type and its methods.
This means that although an interpreted type can implement the interface HTMLer, naively converting it to interface{} will "lose" all the methods, and thus the value will no longer be convertible to HTMLer, because the method HTML() string is now missing.

That's quite abstract, so let's see a practical example: if you run the following in gophernotes

type HTMLer interface {
    HTML() string
}
type MyHtml struct { }
func (m MyHtml) HTML() string {
  return "<h1>hello, world!</h1>"
}
var m MyHtml

m // top-level expression, is returned to gophernotes for displaying

then gophernotes can detect that m has interpreted type MyHtml and that such type implements HTML (it need to ask the interpreter for such information, though). So it would be able to display it as HTML.

This would work too:

var h HTMLer = m
h

because the returned type is now HTMLer, exactly what gophernotes would check for.

Instead, the following

var i interface{} = m
i

loses such information: there is no way to recover the type MyHtml and its methods from the value i - the only thing you can recover is that it has some unnamed type struct {} and no methods. Thus gophernotes would not be able to display it as HTML.

In summary: yes, gomacro supports interpreted interfaces and methods on interpreted types, but they are emulated and some use cases are not supported. One of the unsupported cases is the conversion from interpreted type to interface{} and back - which in my opinion would be needed to make the proposed interface HTMLer really useful.

I don't see the issue with gomacro as that big of a deal. In Python Jupyter notebooks, people don't generally define their _repr_*_ methods in the notebook. Those methods are typically defined on objects in libraries that have notebook integration (e.g., pandas, matplotlib, plotly, etc.). That way, when someone's library is used in a notebook context, it automatically has nicer rendering capabilities than just using String when used as the last expression in a cell.

I think that is a much larger use-case than defining rendering methods interactively in a notebook. In fact, most Python Jupyter notebook users don't even know about the _repr_*_ methods, they just get the benefits by using libraries that do.

Thanks for providing context @kesmit13

Yes, it makes sense that implementing Python _repr_*_ methods, or Go HTMLer interface and friends is something that library authors could/would do,
while it's not how users would normally convert strings or binary data at REPL for better displaying - at REPL, calling display.HTML() and friends is much shorter than declaring a new type and a method on it that implements the desired interface.

So the types that implement HTMLer and friends would be part of a library (a Go package) and thus compiled, not interpreted.

Then I agree that gomacro limitations on interpreted types and interpreted interfaces are not a problem for this use case, which is geared toward libraries that want better integration with gophernotes.

Conclusion: you've convinced me :) now it's just a matter of deciding the name and signature of interfaces and their methods - unless @SpencerPark, @sbinet or @dwhitena have objections, of course

Thanks for that really great explanation @cosmos72! This expresses well, where the limitations of the reflect package leave us if we take this approach.

That's a good point @kesmit13, I don't see many notebooks defining the _repr_*_ magics. I'm in for the display data interfaces! Coming from a Java background my naming preference is full sentences :) but I am liking the *er pattern with a single method with the same name as the *, HTMLer with HTML() string. Concise and predictable.

The return types for everything are outlined in that runtime.go file I linked above such that with the default Go json marshaling everything is the right type for a front end.

I am pretty agnostic.

Just FYI, for neugram - a Go script interpreter - I had implemented a slightly different yet similar approach: use the type "sniffer" from net/http to infer the []byte payload's type.
see (BSD-like, so no worries):

users would just have to call a func like so:

go> display(o)

where:

func display(v interface{}) []byte { ... }

this could of course be turned into a proper interface:

type JupyterDisplayer interface {
    Display() []byte
}

Thanks @sbinet
I already implemented your suggestion to autodetect []byte payload type if the user-specified mimetype is empty - it's used by display.Byte, display.File, display.String, display.Reader, display.WriterTo and display.Any.

In particular, the last one is modeled after your

func display(v interface{}) []byte { ... }

as its signature is

func Any(mimeType string, data interface{}) Data { ... }

and as I said above it will try to autodetect mimeType if the received one is empty

Update: I am considering the whole picture, and while I am now convinced with the idea of having interfaces like

type HTMLer interface {
  HTML() string
}

I am also a bit concerned about the sheer number of already existing functions display.* and the similarly large number of interface we would need to define: at least one each for HTML, Javascript, JSON, LaTeX, LaTeX math, Markdown, JPEG, PNG, SVG - and the list can only grow.

So I was thinking about something more similar to what @sbinet did in neugram, i.e. using interface{} and, optionally, mimetype autodetection.

Instead of having HTMLer, JSONer, PNGer etc. my proposal is to add a single interface

type Displayer interface {
  Display() (mimetype string, data interface{})
}

and use autodetection if the returned mimetype is empty. The returned data interface{} should be one of: string, map[string]interface{}, []byte, io.Reader, io.WriterTo, image.Image (this would be tested at runtime).

Similarly, the many existing functions display.* could be reduced: I would remove at least display.Bytes, display.Reader, display.String, display.WriterTo, keep the rest of display.<NAME>, and add a single

func Auto(data interface{}) Data

that simply calls display.Any("", data), which already performs mimetype autodetection.

Comments?

What do you do in the case that something can be rendered multiple ways? For example, if the notebook is run in a terminal, plain text makes the most sense. In a browser, it could be either HTML or Javascript.

Maybe this doesn't apply to gophernotes, I'm not sure if it has other backends like the Python Jupyter kernel.

Good point, I will think about an alternative that supports it

Update: what about two interfaces?
The above one for simpler cases:

type SimpleDisplayer interface {
  Display() (mimetype string, data interface{})
}

and if a library wants to provide multiple representations, a fully general interface

type Displayer interface {
  Display() struct {
	Data      map[string]interface{}
	Metadata  map[string]interface{}
	Transient map[string]interface{}
  }
}

For clarity, we can factor out the following

type BundledMIMEData = map[string]interface{}
type Data = struct {
  Data      BundledMIMEData
  Metadata  BundledMIMEData
  Transient BundledMIMEData
}

(which is almost exactly the current definition of gophernotes' display.Data) and the proposed interface thus becomes

type Displayer interface {
  Display() Data
}

I could see that working. One thing I'm wondering about, does the notebook engine have enough context to know what types in wants? I'm wondering if there is a way for it to request the types that it can render in order of preference, then the Display method can return the best one it can do. It's not necessary to make this work, but it could save rendering of extra thing that wouldn't get used.

I'm a little behind on the conversation but let me jump back a couple of messages and comment:

at least one each for HTML, Javascript, JSON, LaTeX, LaTeX math, Markdown, JPEG, PNG, SVG - and the list can only grow.

This list is missing one for MIMEBundle which covers any other possibility so that list shouldn't need to grow as the interfaces simply add some type safety to popular formats. I think omitting that for the most recent Display signature is much better though.

something more similar to what @sbinet did in neugram, i.e. using interface{} and, optionally, mimetype autodetection

I really like the autodetection but don't think it is a replacement as, discussed above, the case of rendering in multiple formats. The proposed interface returning the bundle of Data, Metadata, and Transient data makes a lot of sense here but I still think is orthogonal to the convenience interfaces. A display functionality like described is available in IPython as well but that doesn't discourage the additional common formats with their appropriate return types.

The interfaces are simple and should never change (HTML is always rendered as a string as serialized as such) so there isn't much maintenance on our side. I think it's a harmless addition.

One thing I'm wondering about, does the notebook engine have enough context to know what types in wants?

No it does not and one case to consider as to why this would be undesirable is that notebook outputs are persisted in the notebook so rendering in one frontend may prefer a format that another does not yet the notebook is not reexecuted between switching frontends so we want to keep all possible projections in there.

Maybe this doesn't apply to gophernotes, I'm not sure if it has other backends like the Python Jupyter kernel.

Gophernotes is a jupyter kernel so it should work with any frontend that speaks jupyter (protocol 5.1) :)

Let me try to summarize: the general interface

type Displayer interface {
  Display() struct {
	Data      map[string]interface{}
	Metadata  map[string]interface{}
	Transient map[string]interface{}
  }
}

will work, and supports multiple representations. It does not allow negotiating "preferred format(s)" between gophernotes and a library that implements it, but that's not a problem for several reasons:

  1. gophernotes does not have preferred format(s), and it should work with any frontend that speaks Jupyter protocol, not just Jupyter notebook.
  2. notebook outputs are persisted in files, which can be opened by any other frontend that understands them - again, it's not possible to identify "preferred format(s)".

Having solved the general case, let's turn to special cases and convenience interfaces for them:

type SimpleDisplayer interface {
  Display() (mimetype string, data interface{})
}

allows rendering in a single arbitrary format and (this is orthogonal) can support mimetype autodetection.

Going to even more specialized cases, we could add one or more of the interfaces discussed at the beginning:

type HTMLer interface {
    HTML() string
}
type JSONer interface {
  JSON() string
}
type PDFer interface {
  PDF() []byte
}
type PNGer interface {
  PNG() []byte
}
// ...

Is this correct?

Then for sure it's worth adding the first interface (the general one) and it's convenient to add the second one (the single arbitrary format).

I do not have strong opinions in favor or against the other specialized interfaces - I just noted that they may grow to a quite large number.

I think the Displayer interface or the individual interfaces are the only ones flexible enough to handle all of the current use-cases. I could go either way on them. The individual interfaces feel more like "Go" (e.g., io.Reader, io.Writer, Stringer) and correlate well with the IPython _repr_*_ methods, but may require updates in the future to add other. I could be happy with either. Odds are, I would personally write the HTML, JSON, etc. methods on my objects and write a Display method that just checked for those and populate the Data field in the result with the results of those calls.

Not to nitpick on the terminology but I think it conceptually makes a big enough difference here to mention: s/Display/Render/g. Where display is the action of publishing something that was rendered.

So I agree with this interface as a must (name pending):

type Displayer interface {
  Render() struct {
	Data      map[string]interface{}
	Metadata  map[string]interface{}
	Transient map[string]interface{}
  }
}

if that interface is implemented, no other considerations for rendering should be considered. We take this as "you want control of how your output is rendered".

Then since there some common formats and as @kesmit13 mentions, the popular implementation for Render() would simply include those if defined, we can facilitate this behavior and bring some consistency to libraries that support richer rendering in gophernotes. i.e. if Displayer is not implemented, we provide an implicit one that includes the outputs for all of the common interfaces that are implemented.

I understand the concern here about the amount but that number is 7, give or take a few, which IPython has held strong to for almost 3 years so I don't think there is too much to worry about.

Thanks to both for the feedback!

I agree, Render sounds nice and is actually more accurate. Then the interface could be Renderer :)
I will implement that, and also the single mimetype version.

Currently, gophernotes knows about 10 different mime types and how to render them. Not excessive, but that's still 10 single-method interfaces.

I prepared pull request #169
You are welcome to review it and give feedback, :)

If I have time, I will try to fix the limitation that implementing HTMLer and friends in interpreted code does not work as expected.

Done. Now implementing HTMLer and friends in interpreted code work as expected :)
See the updated pull request #169

I will test it a little more, wait for any feedback and if all goes well merge it in a few days