`fs2.data.xml.XmlException: character 'ʿ' cannot start a NCName`
armanbilge opened this issue · 6 comments
Via http4s/http4s-scala-xml#25 (comment).
//> using scala "3.1.2"
//> using lib "org.gnieh::fs2-data-xml-scala::1.4.1"
import cats.effect.*
import fs2.*
import scala.xml.*
val xml = """<Ẵ줐샃뗧饜孫 悊頃ふ퉞="ꨍ邭䋒ừ" 듸괎:ʿक턻뽜="촏"/>"""
object App extends IOApp.Simple {
def run = for
_ <- IO(XML.loadString(xml)) *> IO.println("scala-xml works")
_ <- Stream.emit(xml).covary[IO].through(fs2.data.xml.events()).compile.drain *> IO.println("fs2-data works")
yield ()
}
scala-xml works
fs2.data.xml.XmlException: character 'ʿ' cannot start a NCName
at fs2.data.xml.internals.EventParser$.fail$1$$anonfun$1(EventParser.scala:40)
at fs2.Pull$$anon$2.cont(Pull.scala:183)
at fs2.Pull$BindBind.cont(Pull.scala:701)
at fs2.Pull$ContP.apply(Pull.scala:649)
at fs2.Pull$ContP.apply$(Pull.scala:648)
at fs2.Pull$Bind.apply(Pull.scala:657)
at fs2.Pull$Bind.apply(Pull.scala:657)
at fs2.Pull$.go$1$$anonfun$1(Pull.scala:1207)
at fs2.Pull$.interruptGuard$1$$anonfun$1(Pull.scala:933)
at get @ fs2.internal.Scope.openScope(Scope.scala:281)
at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
at flatMap @ fs2.Pull$.goCloseScope$1$$anonfun$1$$anonfun$3(Pull.scala:1187)
at update @ fs2.internal.Scope.releaseChildScope(Scope.scala:227)
at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
at modify @ fs2.internal.Scope.close(Scope.scala:262)
at flatMap @ fs2.Compiler$Target.flatMap(Compiler.scala:162)
at flatMap @ fs2.Pull$.goCloseScope$1$$anonfun$1(Pull.scala:1188)
at handleErrorWith @ fs2.Compiler$Target.handleErrorWith(Compiler.scala:160)
at flatMap @ fs2.Pull$.goCloseScope$1(Pull.scala:1195)
at get @ fs2.internal.Scope.openScope(Scope.scala:281)
Adding Scalacheck-based tests as proposed in scala/scala-xml#606 would help catch these in fs2-data itself.
I fear this is a limitation of the current character enumeration method. I need to dig deeper.
After investigating more I understood what the problem is. The fs2-data XML parser uses XML namespace, which restricts the range of valid element identifier.
The character classes defined here can be derived from the Unicode 2.0 character database as follows:
Name start characters must have one of the categories Ll, Lu, Lo, Lt, Nl.
Name characters other than Name-start characters must have one of the categories Mc, Me, Mn, Lm, or Nd.
I might change this, to make it optional through an option (NCName parsing or not).
Would that be acceptable to you?
It looks I was referring to an obsolete version of names, I need to change it, actually…
Glad you figured it out. I have no clue about this stuff, just reporting the discrepancy I discovered. Appreciate your work!!
Btw Ross ended up publishing scalacheck instances for scala-xml:
https://github.com/typelevel/scalacheck-xml