purescript-halogen/purescript-halogen

Render external HTML

lthms opened this issue Β· 31 comments

lthms commented

Hi everyone.

I love Halogen and try to use it for several of my projects where I need a web ui. Unfortunately, for one of them, I face an issue I have no idea how I can solve.

In this project, the UI will be used to display some HTML retreive from a REST api. I want to display this HTML, but I cannot found how this can be done using Halogen, as it protects against code injection by default. Do you have any idea how I can do that?

garyb commented

I think you'll have to do something slightly hacky with innerHTML for this:

  1. Use a ref to capture the element for the container you want to display your pre-rendered HTML in
  2. Ensure there is a key on it, as a hint to virtual-dom that the element should be preserved during each render cycle
  3. When you get your data, set the innerHTML of the ref-captured element

I think unless we introduced a parser to produce Halogen-HTML from an input string then that's the only option at the moment.

lthms commented

Thanks for the quick answer! Just to be sure, you are saying that it is not possible to do it using the render function of a component? If you have some time, you think you could give me a minimal example? (I will search myself, but I am not familiar with virtual-dom…)

I think unless we introduced a parser to produce Halogen-HTML from an input string then that's the only option at the moment.

Why not adding a new constructor to HTML that won’t be sanitized? UnsafeText or something

garyb commented

Yeah, that is what I'm saying unfortunately. I don't have time to mock up an example right now but will try to remember to come back and do so later (I can perhaps add it to the guide for people who run into this in the future!).

The problem is down to using virtual-dom here, so even extending HTML with a constructor doesn't solve the problem: the idea of virtual-dom is:

  • there's an in-memory data representation of the HTML
  • every time there's a re-render this data representation is diffed against the previous tree to generate a patch
  • the patch is then applied to the DOM

The idea is to minimize the amount of DOM mutation that goes on, since that's the slow part of rendering - the patch (in a perfect world) contains the minimal changes necessary to update the DOM to the desired result.

So, the problem: we have the HTML as a blob of text, so it doesn't fit into this diffable tree representation, preventing us from integrating it into the normal render process.

lthms commented

(I can perhaps add it to the guide for people who run into this in the future!).

I think it would be very valuable (at least for me! d: ) In the meantime, I will try to have a look at what you said but I am very unfamiliar with all of this.

Anyway, I have a purescript version of my parser/generator for offline and realtime preview and it is general enough to produice a HTML output. However, even that is not totally possible as I try and values such as   are also sanitized ):

Thanks again for your help and hints. I will try to write something myself, but I am definitely interesting by your mock version (:

lthms commented

Finally I was able to produce an Array of HTML and it works! Injecting HTML stiil seems like a good feature to have, however, but I found some reference to the ref-thing you were talking about, so if really I need it, I think I will be fine (and if it is the case, I will try to do a PR here to add some doc about it).

Thank you for your help!

lthms commented

Thank you very much! I will have a look as soon as I can.

I have the exact same problem.

By default, the highlighting will wrap highlighted text in <em> and </em>

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html

i can change the tag but it will still be html .. or i have to parse the database result and concat it together which adds unnecessary overhead.

I don't require this html to be parsed and analyzed. When the string needs to re-render it can re-render the "entire" html blog (in my case this blob is just a few <em></em> tags .. so i don't worry about this)

@lethom can you post up your solution? I would like to compare it with the one from @ibrahimsag

I hope that makes it into an official component !

rnons commented

Closing this as @rnons' library looks like it covers the original use case and there aren't plans to move any functionality like that into Halogen itself, but I can reopen if further discussion is needed.

@thomashoneyman can you re-open?

The workaround is not so great and i think there is room for better solution given caveat not to expect some features to work on the inserted html (see down below).


First why is the current situation not really ideal:

Problem with halogen/vdom not supporting first class HTML

  1. Not all html elements are supported by halogen. (this can be fixed with effort of course)
  2. Halogen can not be used with web components
  3. Can not use html generated by other tools
  4. VDom wants to make Node in tree for every html tag found -> This is unnecessary for static html causing overhead on runtime.
  5. VDom wants to make Node in tree for every html tag found -> every tag supported means more library code to 1. maintain by library authors 2. ship to client.

Problems with component workaround:

  1. Overhead from widget life cycle
  2. Slot types bubble up to parent component -> Can not compose in single place. Needs HTML + slot management in different places because no encapsulation.
  3. Slot types bubble up to parent component -> Can not compose safely (compile time) because of runtime warning Halogen: Duplicate slot address was detected during rendering, unexpected results may occur

Going back to original problem description:

So, the problem: we have the HTML as a blob of text, so it doesn't fit into this diffable tree representation, preventing us from integrating it into the normal render process.

What is the problem of not fitting in the tree representation? Slap on another data constructor on the tree node sum type. Whenever handling this constructor the entire element, including all it's children, can be removed/inserted (depending on what the vdom wants to do).

Obviously you can not use a bunch of features such as click events in this piece of the DOM. Though when reading first post

In this project, the UI will be used to display some HTML retreive from a REST api.

I don't believe that in this use case it was expected to have features like events on the html loaded from REST api. In my situation of 2017 this was not the case too. And in the new 2022 situation it's also not the case.


The lack of first class html is quite high friction when designing HTML first. For the website layout design it's really valuable to directly copy snippets of html, css, svg, etc of the web and work on design. The friction with the workaround is mainly due to having to manage slots and also checking runtime console to see if there was not a mistake somewhere with slot naming. Also including an entire slot clutters code more than just a simple raw function. The developer UX can be better in this scenario.

garyb commented

I'm not sure I understand, a lot of the things you're talking about don't seem to be related to "I have a dynamic chunk of HTML I want to integrate in a Halogen app"?

@garyb

nobody mentioned "dynamic html". What are you talking about?

garyb commented

That's what the issue is about. Loading HTML from an external source as a string and integrating it into a Halogen app.

When you have a blob of HTML and want to load that into the DOM that doesn't make the html itself "dynamic".

What do you mean with integrating? What kind of integration are you talking about?

Just for clarity what i am talking about and other posters is a static string of html such as in this snippet https://gist.github.com/prathje/7422e49b7c809fe8236bb2f213e7076e#file-parentview-purs-L30

garyb commented

Fair enough, these terms are very open to interpretation, so my meanings:

  • dynamic: not statically available at the time the compiler is run
  • integrated: included as subtree of the DOM elements rendered by Halogen

From the gist you linked it appears we are talking about the same thing, mostly.

Having gone back and re-parsed your comment here #324 (comment) I see there is actually a proposal in the middle of it. It would have been much clearer if you'd just posted that without including the tangents about problems you have with slots and Halogen's HTML syntax. πŸ˜„

Reasons why including a node in the VDOM for HTML DOM subtrees is not necessarily a perfect solution to this either:

  • It doesn't solve the problem of turning a String into a HTML DOM subtree which was half of the problem this issue was addressing.
  • It introduces a new type of footgun, where if the same subtree is used in multiple places in the VDOM it will only appear in the last place that is rendered by Halogen (which won't necessarily be consistent if it's referred to in multiple components, since components have their own render lifecycles).

It's definitely not a bad idea, but I'm not sure it's so much better than the current alternatives that it's worth introducing so that anyone can shoot themselves in the foot with it. Halogen more than those than I'd like already πŸ˜• (duplicate slot addresses and initializer deadlocks being the main two).

I wonder if something like the portal stuff Thomas has been working on would be a solution to the kind of problems you're having, since it seems like your motivation is to avoid using Halogen's VDOM and components as much as possible - so essentially reversing the thing here, and inserting Halogen components into existing HTML where they're necessary, rather than other way around?

I see there is actually a proposal in the middle of it.

Yes it might have been better to structure the text differently

It would have been much clearer if you'd just posted that without including the tangents

I felt that that was necessary to include that because that is the consequence of the current workaround

if the same subtree is used in multiple places in the VDOM it will only appear in the last place that is rendered by Halogen

Sounds like something of pointers / object references. Purescript copies values by default. Only when you use IORef you get a pointer-like effect where multiple references can point to same shared memory. When halogen does a single render cycle and encounters subtree in VDOM multiple times it should also insert into DOM multiple times.

Conceptually it is similar to a node with just text. When you have something like

myText :: forall w i. HTML w i
myText = HH.text "hello world"

and pass this around (make copies of this value) into different halogen html templates, it should and it does render this text in various places.

Since purescript and by extension halogen has to full power of javascript to it's disposal it doesn't seem impossible. What about doing similar to how text does it?

  1. halogen text function
  2. vdom primitive for text -- this was the point where i was proposing to add another data constructor
  3. constructing virtual dom
  4. constructing text node in vdom
  5. direct call to createTextNode function

It doesn't solve the problem of turning a String into a HTML DOM subtree which was half of the problem this issue was addressing.

The DOM will do this automatically when you pass a (valid) html string to the innerHtml property https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML

garyb commented

Is this the proposed modification to VDom you had in mind?

import Web.DOM (Node)

data VDom a w
  = Text String
  | Elem (Maybe Namespace) ElemName a (Array (VDom a w))
  | Keyed (Maybe Namespace) ElemName a (Array (Tuple String (VDom a w)))
  | Widget w
  | Grafted (Graft a w)
  | Raw DOM.Node -- <-- this

| Raw DOM.Node -- <-- this

FWIW, I think if you were to add this, it should be as a widget in Halogen. I think the vdom internals should be self contained and environment agnostic.

Is this the proposed modification to VDom you had in mind?

Something like that ye. Possibly just Raw String or what nate suggests to put it in widget place seems also reasonable. It would have to depend on what is convenient codewise to put what where and how the diffing of VDOM tree and DOM tree has to be done.

Hey, I think I ran into exactly this issue writing my blog in Purescript and thought I could add a real use case.

I am precompiling markdown blog posts with pandoc using various filters (e.g. katex for math). I was able to modify and use https://github.com/rnons/purescript-html-parser-halogen for this to read the precompiled .html files, including inline svg that need to go into a different namespace, and render them in halogen. This works very well for smaller posts and it's super fast but I am running into performance issues on larger posts with a lot of math, e.g. in this one (source). It's not as much a problem on my laptop but on mobile it takes seconds to open this page and that's not due to downloading.

I don't understand how halogen works under the hood but my suspicion is that this is due to parsing, converting the whole html file from String -> Html.Parser data types -> Halogen data types -> injection into browser dom. I understand the appeal of having the html parsed into the correct data types but a workaround to skip all of this and directly injecting it into the browser dom seems desirable as well for speed (@rnons, maybe the parsing library could be optimized but not sure). I have also used the markdownit-halogen module directly (with katex and highlightjs adding lots of extra nodes) with similar performance problems which isn't a surprise because it uses the same parsing library.

I may be completely off with this suspicion so please don't hesitate to correct if I'm saying anything wrong.

garyb commented

@MMesch I'd recommend this approach #324 (comment) for what you're trying to do just now (or maybe even consider using an iframe that is then resized to fit the inner content!).

In your case I'd feel perfectly safe using innerHTML, as it's for content you prepared - generally I'm wary of it, as using innerHTML directly open things up to XSS exploits.

There is the #787 PR, but that is going to perform worse than managing innerHTML yourself as it doesn't handle patching (so you have to pay for the browser parsing the string and reconstructing the DOM subtree every render).

Thanks @garyb . I guess I'll find out but do you anticipate that innerHTML will work correctly with inline svg and mathml elements that have a different namespace than html nodes?

garyb commented

I would expect that to work, yeah! It should act just like the browser does when encountering that markup in a normal document as far as I know.

+1

I'm running into this problem now. I've got a webpack config setup such that it loads markdown files into HTML via markdown-loader, which I then import in a JS file and FFI into PureScript. I'm using webpack-dev-server for hot reloading and there are times when switching between pages does not update the content.

I think you'll have to do something slightly hacky with innerHTML for this:

1. Use a `ref` to capture the element for the container you want to display your pre-rendered HTML in

2. Ensure there is a `key` on it, as a hint to `virtual-dom` that the element should be preserved during each render cycle

3. When you get your data, set the `innerHTML` of the `ref`-captured element

I think unless we introduced a parser to produce Halogen-HTML from an input string then that's the only option at the moment.

If I'm understanding this correctly, the below code implements the above idea. If it does not, could you indicate where it is wrong? Here's what I currently have. Whether I use a Hooks- or Classic-style component, I find that switching between two routes in my application where each one is rendering a markdown page using the below implementation will sometimes not update the content to the new page's content:

module UI.Components.RawHtml where

import Prelude

import Data.Foldable (for_)
import Data.Maybe (Maybe(..))
import Data.Tuple (Tuple(..))
import Effect.Aff (Aff)
import Effect.Uncurried (EffectFn2, runEffectFn2)
import Halogen (Component, RefLabel(..), defaultEval, liftEffect, mkComponent, mkEval)
import Halogen as H
import Halogen as Halogen
import Halogen.HTML.Elements.Keyed as HK
import Halogen.HTML.Properties as P
import UI.Utils.Halogen (NoOutput, NoQuery)
import Web.HTML (HTMLElement)

foreign import data RawHtml :: Type

type Input = { key :: String, content :: RawHtml }

data Initialize = Initialize

renderRawHtml :: Component NoQuery Input NoOutput Aff
renderRawHtml = mkComponent
  { initialState: identity
  , render: \{ key } ->
      HK.div_
        [ Tuple key $
            HK.div
              [ P.ref $ keyedRefLabel key ]
              []
        ]
  , eval: mkEval $ defaultEval
      { initialize = Just Initialize
      , handleAction = case _ of
          Initialize -> do
            { key, content } <- H.get
            mbElem <- Halogen.getHTMLElementRef $ keyedRefLabel key
            for_ mbElem \elem -> do
              liftEffect $ runEffectFn2 unsafeSetInnerHTML elem content
      }
  }
  where
  keyedRefLabel key = RefLabel $ key <> "-ref-label"

-- On JS side:
-- export const unsafeSetInnerHTML = (el, content) => {
--  el.innerHTML = content;
-- };
foreign import unsafeSetInnerHTML :: EffectFn2 HTMLElement RawHtml Unit

Here was the Hooks one based on these gists: https://gist.github.com/ibrahimsag/e142652bcad3c8ade14a727ae952937a and/or https://gist.github.com/prathje/7422e49b7c809fe8236bb2f213e7076e:

renderRawHtml :: Component NoQuery Input NoOutput Aff
renderRawHtml = Hooks.component \_ { key, content } -> Hooks.do
  _ /\ rawHtmlId <- Hooks.useState false
  let containerRef = RefLabel "rawHtml"
  Hooks.useLifecycleEffect do
    log "Running raw html initializer"
    mbElem <- Halogen.getHTMLElementRef containerRef
    for_ mbElem \elem -> do
      isRendered <- Hooks.get rawHtmlId
      unless isRendered do
        log "not yet rendereed. Setting inner html"
        Hooks.put rawHtmlId true
        liftEffect $ runEffectFn2 unsafeSetInnerHTML elem content
        log "Finished setting inner html"
    pure Nothing
  Hooks.pure $ do
    let _ = spy "renderRawHtml - paint" ""
    HK.div_
      [ Tuple key $
          HK.div
            [ P.ref containerRef ]
            []
      ]

@JordanMartinez I have had a very similar problem recently, and for my purposes https://github.com/rnons/purescript-html-parser-halogen worked well - it parses the html string into Halogen HTML.

This happened to me when updating an old application, in which the approach of using innerHtml on a ref used to work properly. I don't know whether the problem was in the changes I made, which could well be the case since the old version changed the DOM unnecessarily high up the tree, or some change in Halogen. The symptom I was getting seems to be just what you describe - sometimes switching the html didn't update the content.

That non-hooks version does look like pretty much like what I would have done for this, yeah.

(Actually I wouldn't have bothered with keyed HTML either now, the suggestion was made pre-halogen-vdom - each component is rendered in isolation now, so in a component like this where eval only happens during Initialize the element with the ref won't move around in the rendered HTML. I don't think removing the keying would help though, just an aside since this is an issue from olden times).

I'm at all sure why this would work intermittently, that does sound like a bug. But a few questions about what "changing page" means in this context:

  • Is the page a component with a RawHtml component inside it, or is the page content changing in the parent itself? (So, outer -> page -> raw, or outer -> raw component structure basically).

  • If it's outer -> page -> raw, is raw the only thing inside page? If so it might be the #657 / #766 / #802 issue rather than specific to this.

  • If it's outer -> raw, are you changing the slot name for RawHtml at the same time as updating its content?

#657 might be it. I'll explore that when I have time again.