This library provides non-blocking parsers, writers and filters for handling streaming XML in the zio Scala framework, specifically as ZStream
. Parsing is done by wrapping the Aalto XML parser. Writing uses the standard Java XMLOutputFactory
mechanism (writing to a byte array which is known not to block).
Currently, ZIO 2.0+ is targeted.
A stream of XML is modeled as a ZStream[Any, XMLStreamException, XmlEvent]
, where XmlEvent
is a sealed trait
that closely follows Java's own XmlEvent
structure. A notable exception is that StartDocument
and EndDocument
as absent, since start- and end of a document is already indicated by stream semantics themselves.
The XmlParser
object provides a ZPipeline
that can turn bytes into XML events:
object XmlParser {
def parser(ignoreInvalidChars: Boolean = false): ZPipeline[Any, XMLStreamException, Byte, XmlEvent]
}
If you have a ZStream[Any, Nothing, Byte]
, you can feed that into the pipeline as follows:
val myStream: ZStream[Any, Nothing, Byte] = ???
val events: ZStream[Any, XMLStreamException, XmlEvent] = myStream >>> XmlParser.parser()
Several ways are available to turn XML events back into bytes or DOM-like data structures.
Two ZPipeline
variants exist that emit a document tree after a tag (and children) has been written:
object XmlWriter {
def collectNode(): ZPipeline[Any, XMLStreamException, XmlEvent, scala.xml.Node]
def collectElement(): ZPipeline[Any, XMLStreamException, XmlEvent, org.w3c.dom.Element]
}
The former emits a Scala XML Node
, the latter emits a DOM Element
. Use the variant that matches other libraries you're working with.
You can also just write XML back to bytes, using another ZPipeline
in XmlWriter
.
object XmlWriter {
def writeDocument(charset: Charset = StandardCharsets.UTF_8): ZPipeline[Any, XMLStreamException, XmlEvent, Byte]
def writeFragment(charset: Charset = StandardCharsets.UTF_8): ZPipeline[Any, XMLStreamException, XmlEvent, Byte]
}
Two variants are available. You'll pick one depending on whether you plan to write a single document (writeDocument
) or potentially multiple root nodes as an XML fragment (writeFragment
).
In addition to parsing and writing, a few filters are presented that have proven useful as glue logic. See their ScalaDoc for details. Combined with XmlWriter.collectNode
, they can be used to gather up pieces of a large XML stream for piece-meal further processing.
object XmlFilter {
/** Filters subtrees of nodes residing in the XML document at the direct ancestors given in [path]. The
* subtrees will have the last element of [path] as their parent. Higher ancestors are filtered out. For
* example, filterSubtree("foo" :: "bar" :: Nil), given <xml><foo><bar>1</bar><hello/><bar>2</bar></xml>,
* will emit events for <bar>1</bar><bar>2</bar>.
*/
def filterSubtree(path: Seq[String]): ZPipeline[Any, Nothing, XmlEvent, XmlEvent]
/** Filters subtrees of nodes in the XML with the given name, at any path. The subtrees will have [tagName]
* as their parent (ancestors are filtered out).
*/
def filterTag(tagName: String): ZPipeline[Any, Nothing, XmlEvent, XmlEvent]
/** Removes nodes with the given name, and all of their children, from the stream. The node may occur at any
* level. The rest of the stream is passed through unchanged. */
def filterTagNot(tagName: String): ZPipeline[Any, Nothing, XmlEvent, XmlEvent]
}
When writing XML that should be easily readable by humans, it can be convenient to add indentation to make the nesting of XML elements easier to follow. A ZPipeline
is provided that will re-indent an XML stream on the fly.
object XmlIndenter {
/** Indents a stream of XML parse events (removing any previous indentation first) */
def indent(amount: Int = 2): ZPipeline[Any, Nothing, XmlEvent, XmlEvent]
}