Shinmera/plump

Serialisation Modes

Closed this issue · 7 comments

Currently Plump's serialisation strictly delivers XHTML/XML, however it would often be desired to serialise to HTML instead. Additional hooks are necessary to allow this behaviour. Something similar to the parser's tag-dispatcher list might be a good idea.

That would be very helpful indeed. Annoying issue I just encountered when using plump in conjunction with colorize:

CL-USER> (plump:serialize (plump:parse "<em><span></span><b>foo</b></em>"))
<em><span/><b>foo</b></em>

(expected result: <em><span></span><b>foo</b></em>)

The way HTML5 works, this <span/> is read by the browser as a start tag; the whole is logically read like this:

<em><span><b>foo</b></span></em>

Is there an idiomatic quick workaround, or do I have to embed invisible unicode chars in empty tags? :).

My current quick workaround:

(defparameter +plump-dont-self-close-tags+ '("span"))  ; insert more tags if needed

(defmethod plump:serialize-object :around ((node plump:element))
  (let ((tag-name (plump:tag-name node)))
    (if (and (= 0 (length (plump:children node)))
             (member tag-name +plump-dont-self-close-tags+ :test #'string-equal))
        (progn
          (format plump:*stream* "<~A" tag-name)
          (plump:serialize (plump:attributes node) plump:*stream*)
          (format plump:*stream* "></~A>" tag-name))
        (call-next-method node))))

Alternatively you could change your server to output XHTML5.

Technically yes, but that opens another huge can of worms (not everything I output on the page goes through plump serialization) :).

So I've been thinking about this some more now. It would be nice if I could couple the printing logic with the parsing logic of the tag-dispatchers, as that would avoid duplicating some information about special tags and would allow using the same system for both input and output. However, this in turn would incur backwards-incompatible changes to the tag-dispatcher system, and it would require the plump-dom system to depend on plump-parser.

I don't know if this is worth it.

The other alternatives I can see are:

  1. Add a similar tag dispatchers system to plump-dom itself and replicate the necessary information about special tags.
  2. Add an extra argument to serialize-object on which to dispatch the kind of serialisation.
  3. Add an extra function to print elements with, which has an extra argument for the kind of serialization.

All of these are not nice because they require a separate interface to get consistent I/O from plump if you're using both the parser and printer. 2. has the additional issue of being backwards-incompatible.

I really don't know what the right way to go about it is.

One other thing that might work is to have a mode that records whether or not the tag was self-closed in the original document and then matches that in the serialized form. This would be less-useful for the purposes of normalization but it would prevent bugs like a self-closed iframe swallowing the rest of the page.