ocsigen/tyxml

[RFC] Removing Xml.encodedpcdata

Drup opened this issue · 5 comments

Drup commented

I'm considering removing Xml.encodedpcdata from the API.

Mostly, it comes down to the fact that any "valid" usecase, i.e., inject textual HTML without validation, is almost never compatible with Tyxml's attempt at having a representation-independent API. For instance, Tyxml's XML can correspond to Dom trees, or virtual dom, where such unstructured textual HTML doesn't really make sense.

I believe we would be better off with removing that, and instead provide an easy-to-use bridge with Markup's parsing functions. You could still break the html validity with it, but it would at least give us structured XML, which we can manipulate and use.

I've started working on such import/export functions in the export branch. In particular, it provides a function Xml.of_seq : signal Seq.t -> Xml.t that will be, in time, trivial to plug in Markup.

@vasilisp Comments on the besport codebase ? As far as I can tell, this function is not used directly in ocsigen itself, mostly for the reason given above.

Apparently odoc uses it to inject {[ .... ]}. @aantron, who should be pinged for that ? :)

Removing support for raw text will force us to parse the content of {% … %} in odoc, which will require odoc to depend on an HTML parser. I'd say cc @dbuenzli, @rizo, @Ostera.

Removing support for raw text will force us to parse the content of {% … %} in odoc, which will
require odoc to depend on an HTML parser. I'd say cc @dbuenzli, @rizo, @Ostera.

You don't need to parse what is in the {%html: %} just assume it is valid html (as ocamldoc has always done).

(and if that's a problem with tyxml's typing I would rather suggest to drop tyxml as has been discussed at some point)

I don't think we are using encodedpcdata anywhere.

Drup commented

@dbuenzli The typing aspect is not the crux of the issue, as I explained above.

The issue is that HTML is not text. It simply has a textual representation. With encodedpcdata (or any function that allows you to inject raw HTML text), you make the assumption that HTML is, in fact text, even in circumstances where it really isn't, such as Dom trees, virtual dom, signals, etc.