Spaces between tags are treated as pcdata
Mbodin opened this issue ยท 7 comments
Consider the following OCaml file:
open Tyxml
let%svg test = {|<g> </g>|} (* Notice the space between the opening and closing of the g tag. *)
The content of the g tag is interpreted as a [> `PCDATA ] Tyxml_svg.elt
which is incompatible with [< Svg_types.g_content ] Tyxml_svg.elt
, and I get a type error. In other words the space between the opening and closing of the g tag is interpreted as a pcdata. The same goes if I place a newline instead of the space.
I'm surprised as the SVG specification itself uses a lot of spacing in its examples. For instance the following SVG is provided as a valid example in https://www.w3.org/TR/2003/REC-SVG11-20030114/struct.html#GroupsOverview despite having the same spacing between its g tags.
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg width="4in" height="3in" version="1.1"
xmlns="http://www.w3.org/2000/svg">
<desc>Groups can nest
</desc>
<g>
<g>
<g>
</g>
</g>
</g>
</svg>
I don't know the SVG specification enough to state what is meant to be there: should text with only white spaces be ignored instead of being treated as pcdata?
We had a similar issue in HTML (I though it was in table). Some elements should have a special parsing mode to be white-space insensitive. We introduced it for appropriate HTML elements, so it's a matter of reusing that for the appropriate SVG elements.
Thanks! I can see the discussion in #225.
I haven't found a place in the SVG specification listing which elements should be whitespace insensitive and which shouldn't. Would treating any tags not accepting pcdata as whitespace insensitive be too much?
I'm not quite sure. @aantron , you probably have a bit more expertise there, do you have an opinion ?
I don't have an opinion on this. All I can say is that for let%svg
it should probably be parsed or adjusted according to the SVG specification, which I am not familiar with. For let%html
it should be parsed according to the HTML specification. SVG tags inside an HTML document are not parsed according to the SVG specification, but according to the HTML rules for foreign elements inside HTML, which specifically includes rules for SVG. So on a syntax level, SVG inside HTML is not SVG proper, but looks like SVG.
So on a syntax level, SVG inside HTML is not SVG proper, but looks like SVG.
<insert appropriate swearing/> ๐คฆ
I think we can probably gloss over that.
@Mbodin I agree with your proposal, to ignore all whitespace for elements that don't accept pcdata. It might be worth glancing at the spec if something is specified in that regard, but otherwise, that will suffice.
Is there a fix for this?
It seems like pcdata should be allowed, especially for examples that use <text>somestuff</text>
.
I have been unable to work with types like this which is unfortunate.
@ncitron I think that the above-mentioned branch https://github.com/Mbodin/tyxml/tree/whitespace (PR #331) fixes the issue.
pcdata
is definitively allowed. I did not (at least intentionally) change the behaviour of any tag already accepting pcdata
, so the behaviour of <text>
should be unchanged. I however accepted white-space-only pcdata
(which are ignored) in all the other tags.