lucidsoftware/xtract

How to parse a tree-like XML tags

Closed this issue · 8 comments

I have to parse a tree-like hierarchy into corresponding case classes structure (boolean expressions in my case).
Here is a little example

<And>
    <Condition1>A</Condition1>
    <Condition2>B</Condition2>
    <Or>
        <Condition3>C</Condition3>
        <Condition4>D</Condition4>
        <And>
            <Condition5>E</Condition5>
            <Condition6>F</Condition6>
        </And>
    </Or>
    <Or>
        <Condition5>EE</Condition5>
        <Condition6>FF</Condition6>
    </Or>
</And>

I have case classes for And, Or and other ConditionX tags.

So, how to write XmlReader for that case?

P.S. I'm using version 2.1.0

It's hard to know the best approach without seeing your actual classes.

But assuming you have a sealed trait Condition that all the condition classes extend, you might have something like:

object Condition {
  implicit val xmlReader: XmlReader[Condition] = XmlReader.nodeReader.flatMap { node =>
    node.label match {
      case "And" => And.xmlReader,
      case "Or" => Or.xmlReader,
      case "Condition1" => Condition1.xmlReader,
      ....
    }
  }
}

object And {
  implicit val xmlReader: XmlReader[And] = (__ \ "And").read(XmlReader.seq[Condition]).map(And(_))
}

I haven't tested this at all, and it could proabably be a little more efficient if you pass the children of the node condition's reader instead of the whole node, but that is the general idea.

The problem here is that And-reader doesn't want to apply Condition-reader for its' children.
So it ends up with just an empty And.

It seems like I managed to make it work.

object TreeCondition extends XmlReader[Condition] {
  private def processNode(node: Node): ParseResult[Condition] = {
    node.label match {
      case "And" =>
        ParseResult.combine(
          node.nonEmptyChildren.map(processNode)
        ).map(And.apply)
      case "Or" =>
        ParseResult.combine(
          node.nonEmptyChildren.map(processNode)
        ).map(Or.apply)
      case "Condition1" =>
        Condition1.reader.read(node)
      case _ =>
        ParseFailure()
    }
  }

  override def read(node: NodeSeq): ParseResult[Condition] = XmlReader.nodeReader.read(node).flatMap(processNode)
}

But I don't like that I have to mention "leaf" tag labels in one place. Such a binding is unhandy.

Another question is how to make the tree-traversing abstract and not dependent on particular ADT.

But I don't like that I have to mention "leaf" tag labels in one place. Such a binding is unhandy.
I suppose another option is you could somehow create a collection of XmlReaders for each condition, and have a reader that tries each of them until it succeeds. You could probably use reflection (possibly compile time in a macro) to generate such a collection. Or if you don't mind having the leaf readers in one place you could do something like:

val reader = And.reader or Or.reader or Condition1.reader or ...

Another question is how to make the tree-traversing abstract and not dependent on particular ADT

I'm not really sure what you mean by this. What would that look like?

The "or" is an obvious idea here but there is a problem.
It is not possible to define path for them like
(__ \ "Condition").read...
It must be defined like
__.read...

Why is that so: reader will read content of the matching tag but not the current one.

<And>
  <Condition>1</Condition>
</And>

So, if I define reader for "leaf" using tag label in path then it will try to find "leaf" tag inside of the "leaf" tag. In the other words if will expect this:

<And>
  <Condition><Condition>1</Condition></Condition>
</And>

That is why I have to define "leaf" readers using the second variant __.read... construct.
So "or" could lead to misread parsing.

This is my whole test example and it works fine, but I would appreciate if you could suggest a more concise syntax for that. In real life I have more than 50 different "Conditions". Thanks in advance.

import com.lucidchart.open.xtract._

import scala.xml.{Node, NodeSeq, XML}

sealed trait Condition
object Condition {
  implicit val reader: XmlReader[Condition] = TreeCondition
}
case class And(value: Seq[Condition]) extends Condition
case class Or(value: Seq[Condition]) extends Condition
case class Condition1(value: String) extends Condition
object Condition1 {
  implicit val reader: XmlReader[Condition1] = __.read[String].map(Condition1(_))
}
case class Condition2(value: Int) extends Condition
object Condition2 {
  implicit val reader: XmlReader[Condition2] = __.read[Int].map(Condition2(_))
}
case class Condition3(value: Long) extends Condition
object Condition3 {
  implicit val reader: XmlReader[Condition3] = __.read[Long].map(Condition3(_))
}

object TreeCondition extends XmlReader[Condition] {
  private def processNode(node: Node): ParseResult[Condition] = {
    node.label match {
      case "And" => ParseResult.combine(node.nonEmptyChildren.map(processNode)).map(And(_))
      case "Or" => ParseResult.combine(node.nonEmptyChildren.map(processNode)).map(Or(_))
      case "Condition1" => Condition1.reader.read(node)
      case "Condition2" => Condition2.reader.read(node)
      case "Condition3" => Condition3.reader.read(node)
      case _ =>
        ParseFailure()
    }
  }

  override def read(node: NodeSeq): ParseResult[Condition] = XmlReader.nodeReader.read(node).flatMap(processNode)
}

object Main extends App {
  val str =
    """
      |<And>
      |    <Condition1>A</Condition1>
      |    <Condition2>1</Condition2>
      |    <Or>
      |        <Condition1>C</Condition1>
      |        <Condition2>10</Condition2>
      |        <And>
      |            <Condition1>E</Condition1>
      |            <Condition3>500</Condition3>
      |        </And>
      |    </Or>
      |    <Or>
      |        <Condition1>G</Condition1>
      |        <Condition3>255</Condition3>
      |    </Or>
      |    <And>
      |        <Condition2>15</Condition2>
      |        <Condition3>354</Condition3>
      |    </And>
      |</And>
      |""".stripMargin

  val xml = XML.loadString(str)

  val parsed = XmlReader.of[Condition].read(xml)

  val expected =
    And(Seq(
      Condition1("A"),
      Condition2(1),
      Or(Seq(
        Condition1("C"),
        Condition2(10),
        And(Seq(
          Condition1("E"),
          Condition3(500)
        ))
      )),
      Or(Seq(
        Condition1("G"),
        Condition3(255)
      )),
      And(Seq(
        Condition2(15),
        Condition3(354)
      ))
    ))

  println(parsed)
  println(s"Parsed equal to expected ${parsed.getOrElse(null) == expected}")
}

I was able to simply that a little bit, down to:

import scala.xml.{Node, NodeSeq, XML}
import scala.reflect._

sealed trait Condition
object Condition {
  // it's possible to generate this with a macro, if so desired
  implicit val reader: XmlReader[Condition] = And.reader | Or.reader | Condition1.reader | Condition2.reader | Condition3.reader
}

abstract class ConditionParser[T <: Condition : ClassTag, U: XmlReader] {
  protected val name: String = implicitly[ClassTag[T]].runtimeClass.getSimpleName

  def apply(v: U): T
  implicit val reader: XmlReader[T] = XmlReader.label[U](name).map(apply _)
}

abstract class CompoundConditionParser[T <: Condition : ClassTag] extends ConditionParser[T, Seq[Condition]]()(implicitly, __.children.lazyRead(XmlReader.strictReadSeq(Condition.reader)))

case class And(value: Seq[Condition]) extends Condition
object And extends CompoundConditionParser[And]
case class Or(value: Seq[Condition]) extends Condition
object Or extends CompoundConditionParser[Or]
case class Condition1(value: String) extends Condition
object Condition1 extends ConditionParser[Condition1, String]
case class Condition2(value: Int) extends Condition
object Condition2 extends ConditionParser[Condition2, Int]
case class Condition3(value: Long) extends Condition
object Condition3 extends ConditionParser[Condition3, Long]

It still requires listing all of the conditions, but removes a lot of the boilerplate. It is probably possible to use a macro to generate the list of possible conditions, but unless you have a ton of conditions, and the condition change frequently it porbably isn't worth it.

I suppose another option would be to look at the label, then use reflection to try finding a class with that name, then if successfull look at the parameter type of the constructor to extract the contents of the tag. That would remove the need for the parser to know at compile time all the conditions. But adds additional complexity, and probably doesn't perform very well.

I'm going to close this since it hasn't been active for a while. If there are remaining questions, I can reopen it.