typelevel/jawn

Special character stripped away in String containing double quotes

Closed this issue · 3 comments

I recently encountered an issue where £ would be stripped away from json string when the string also contains a ". Here is a relatively small test using argonaut facade:

case class Example(i: Int, s: String)

object Example {
  implicit val codec = Argonaut.casecodec2(Example.apply, Example.unapply)("i", "s")
}

val name = "hel£££l \"o £ "
val example = Example(5, name)
val s = Argonaut.nospace.pretty(Example.codec.encode(example)) // {"i":5,"s":"hel£££l \"o £ "}
implicit val facade = jawn.support.argonaut.Parser.facade

val parser = AsyncParser[Json](AsyncParser.SingleValue)
val parsedString = parser.absorb(s)
  .fold(_ => None, _.headOption)
  .flatMap(_ -| "s")
  .flatMap(_.as[String].toOption)

assert(parsedString === Some(name))  // Some("hell "o  ") did not equal Some("hel£££l "o £ ")

The cause appears to me to be a bug in CharBuilder, which is used by ByteBasedParser#parseString on a code path only taken when there are escaped characters (such as \") in the string. A multibyte character (such as £) is appended to the builder using the extend method, which does not seem to update the len pointer, hence it will be overwritten by the next character.

The patch in 38f5f1f makes Julien's test case pass for me.

@non when you get some time, would you mind to have a look at @bmjames PR?

I can confirm that this is the villain in http4s/http4s#514.