Mixed elements and text nodes
Opened this issue ยท 3 comments
This code sample:
const src = `
<?xml version="1.0"?>
<root>
<x>blah</x>
xlub
<x>blah</x>
</root>
`;
const parsed = parse(src);
Fails with
error: Uncaught SyntaxError: Expected next sequence to be "<", got "x" instead
throw new SyntaxError(Expected next sequence to be "${content}", got "${this.peek(bytes)}" instead
);
I do think it should parse as mixing element nodes and text nodes is allowed. (The sample was deduced from some xhtml which did not parse).
Yes indeed, seems that currently mixed nodes content is only supported when the first node is a textnode, if it's a content node then it fails as you mentioned
I'll look into it somewhere next week, thanks for reporting this
Hello!
Any news on this issue?
I stumbled into same error but with following xml structure:
<parameters>
<!-- some comments -->
text line 1
text line 2
</parameters>
The comment is parsed correctly but thereafter it expects a tag and tries to find <
.
I've tried to fix it inside utils/parser.ts
and found two spots where it wrongly assumes to consume a tag:
- anywhere after line 53 but before 74: check that there could be a text node.
- between 104 and 104: same.
- especially between lines 199 - 200;
node
is assumed and expected, but can be text or comment. - ... not sure if more spots need to be reviewed.
Then I tried to insert the following between 199-200:
if ((this.#peek(tokens.cdata.start)) || (!this.#peek(tokens.tag.start))) {
Object.assign(tag, this.#text({ close: name, path: [...path, tag] }));
continue
}
And it worked, but it's only for the case 3) and not foolproof.
Maybe the can give some suggestions?! ๐
Hi !
I'm so sorry, I completely forgot about this ๐
!
Yes indeed, as you noted the main issue is around lines 200
Then I tried to insert the following between 199-200:
if ((this.#peek(tokens.cdata.start)) || (!this.#peek(tokens.tag.start))) { Object.assign(tag, this.#text({ close: name, path: [...path, tag] })); continue }And it worked, but it's only for the case 3) and not foolproof.
I think your patch should cover most cases, as the main reason why it's currently not working is because it first checks whether the stream is currently faced with a cdata/text node, and if not, assume that everything will be child content node, but between each child node it should check again the presence of cdata/text nodes which is currently not done (and what your patch actually fixes)
Also, it may be possible that your issue is slightly different that the one reported in the first post, because it seems like comments are not peeked/consumed if they're the first child (there's no this.#peek(tokens.comment.start))
in the mentioned lines) which I think is the reason why it fails in the example you provided
I'm fine with the patch proposed, but I'm not sure if the output match the current stated behaviour in the doc which is to treat mixed content as raw text. If not, maybe one solution would be to store the stream cursor position before entering the conditional, and if fails to consume a tag start toke, backtrack the stream cursor at previous position and force the text consuming path ๐ค ?
As for case 1, I don't think it's possible to have text node at the same level of prolog/doctype/root node, but I could be wrong
For case 2, seems the lines numbers you provided are the same, but assuming you mean these lines:
https://github.com/lowlighter/xml/blob/448b77319701201654c8dc3f5a6ea7451fdf9f90/utils/parser.ts#L96-L105
Then it should be already be covered by the case 3 fix
Anyway thanks a lot for investigating through this, I really appreciate it ๐