unifiedjs/ideas

XML AST

Closed this issue · 17 comments

Hi,

Playing with some awesome mdast/hast unist stuff recently, when i went through XML again i wondered if we could use unist tools to play with XML trees, but looks like its not yet possible ?

To implement, the steps are :

a) create the syntax-tree

i guess this is defining the tree "model" and it will be very much like this one ? https://github.com/syntax-tree/hast#ast

b) create the parser/stringifier

I guess this is something like https://github.com/rehypejs/rehype/tree/master/packages/rehype-parse and https://github.com/syntax-tree/hast-util-to-html ?

a) create the syntax-tree

Yup! Mostly like HAST. But with some added nodes for processing instructions, cdata, and whatnot!

b) create the parser/stringifier

Probably built on some other XML parser. Depends on how far you’d like to go. There’s some weird stuff (like custom entities) in XML!

@revolunet Are you into working on this?

nope, didnt had a chance yet :/

Oh that's okay! I think it's pretty interesting tho!

Is there any update on this one? @revolunet

nothing new on my side sorry

https://github.com/nashwaan/xml-js looks promising as a starting point.

Nice, but can’t see anything about positional info?

And how about the naming:

  • processor: rexml?
  • st: xast?

processor: rexml?

that may cause confusion with: https://github.com/ruby/rexml

Hmm, different ecosystem plus not many stars, I think it’s fine to reuse that name?

Parsing can now be done with syntax-tree/xast-util-to-xml and serialising with syntax-tree/xast-util-to-xml, so that means the building blocks for rexml (working title?) are there.

However, I’m not sure how well rexml fits in the list of remark, rehype, retext (, redot), and thus unified. I think that’s because XML is data, the others are content. A rehype plugin has knowledge of the semantics of nodes, what they mean, to do a task (find all headings, sluggify them, add the slug as an id)—but XML doesn’t really have this.

So, I’m seeing use cases for xast and xast utilities:

  1. to parse and inspect data (I was recently parsing unicode-cldr)
  2. to construct and serialize data (EPUB files have lots of manifests in XML)

…and I do see a case of going from HTML -> XML with rehype-parse, rehype-rexml, rexml-stringify or so (EPUB books use XHTML)

But I don’t really see the case where a whole unified pipeline would be useful:

unified()
  .use(rexmlParse)
  // …what plugins are useful here?
  .use(rexmlStringify)

I’m wondering, what use cases do you folks have for rexml? Should it exist?

Thanks for the addition !

My use case was simply to parse some XML and store it as AST so i can use select or other utils to play with the tree

Could this be useful in translating HAST/MDAST to/from MJML?

It could be.
It's worth noting, since MJML elements are also valid as HTML web components, rehype could also be used.

It could be helpful to start a new idea thread for MJML.

It could be helpful to start a new idea thread for MJML.

If you're game, so am I!