zio/zio-json

Provide decoding directly from InputStream (and probably Reader) for JVM

gnp opened this issue · 0 comments

gnp commented

There should be a decodeInputStream method on JsonDecoder for JVM. I'll provide some notes on my use case (and performance) and how I wrote an equivalent in user-space.

I have code that obtains an InputStream from an object stored in a Zip file. I want to decode that JSON.

Since there was no API in ZIO JSON for reading from an InputStream, I first had to build a String from the InputStream and then parse the JSON (to JSON AST). That took about 230 +/- 3 ms for my use case. I did that with new String(is.readAllBytes, StandardCharsets.UTF_8).fromJson[Json]

It seemed I should not have to put the whole input in memory though, so with a pointer from @erikvanoosten on Discord, I made a version that did this: JsonDecoder[Json].decodeJsonStreamInput(ZStream.fromInputStream(is), StandardCharsets.UTF_8). But, that took 955 +/- 12ms. This is a significant decrease in performance. Upon investigation, I discovered the implementation of decodeJsonStreamInput is taking my ZStream and converting it back to an InputStream and wrapping it in a Reader.

So, I took what I learned from the above and built a user-level solution out of things I found and copied from the underlying private implementation details in ZIO JSON. For my use case this ran in about 330 +/- 6 ms. Much better than using decodeJsonStreamInput, though admittedly still materially slower than just building and parsing the String! Here is the implementation I'm using outside ZIO JSON, in user code:

final def decodeInputStream[R, A](
      decoder: JsonDecoder[A],
      is: InputStream,
      charset: Charset = StandardCharsets.UTF_8,
      bufferSize: Int = 8192 // Taken from BufferedInputStream.DEFAULT_BUFFER_SIZE
  ): ZIO[R, Throwable, A] = {
    final class UnexpectedEnd
        extends Exception(
          "if you see this a dev made a mistake using OneCharReader"
        )
        with scala.util.control.NoStackTrace

    def readAll(reader: java.io.Reader): ZIO[Any, Throwable, A] =
      ZIO.attemptBlocking {
        try decoder.unsafeDecode(Nil, new zio.json.internal.WithRetractReader(reader))
        catch {
          case JsonDecoder.UnsafeJson(trace) => throw new Exception(JsonError.render(trace))
          case _: UnexpectedEnd              => throw new Exception("unexpected end of input")
        }
      }

    ZIO.scoped[R] {
      ZIO
        .fromAutoCloseable(
          ZIO.succeed(new BufferedReader(new java.io.InputStreamReader(is, charset), bufferSize))
        )
        .flatMap(readAll)
    }
  }

Notes:

  • UnexpectedEnd is a copy of the private class in package zio.json.internal (from readers.scala).
  • readAll is a copy of the private method of that name from JVM JsonDecoderPlatformSpecific
  • I experimented with putting buffering just on the InputStream level, just on the Reader level, or both. For my (single) test case, best performance was with buffering just at the Reader level as shown here. Javadoc for InputStreamReader makes this same recommendation "for top efficiency" (though initially I was expecting it to be better done at the InputStream level).