Error when reading json array file?
SimunKaracic opened this issue · 6 comments
Specifically, this file https://github.com/statsbomb/open-data/blob/master/data/competitions.json
The file is formatted as a json array, and I would like to read the file in a streaming fashion.
When opening the file with:
json.readJsonAs(path)
.tap(foo => ZIO.logInfo(foo.asArray.toString))
.runCount
The entire file is read into a a single item, a json list (instead of providing a stream of each item in the list).
It also throws this error, but seems to recover from it:
22:58:34.586 [zio-default-blocking-2] DEBUG zio.json.JsonDecoderPlatformSpecific -- timestamp=2024-02-22T22:58:34.583386+01:00 level=DEBUG thread=zio-fiber-7 message="Fiber zio-fiber-7 did not handle an error" cause=
zio.json.internal.UnexpectedEnd: if you see this a dev made a mistake using OneCharReader
When trying this to read the file as a Stream[Competition]
ZStream
.fromPath(path.toPath)
.via(
ZPipeline.utf8Decode >>>
stringToChars >>>
JsonDecoder[Competition].decodeJsonPipeline(JsonStreamDelimiter.Array)
)
.runCount
I get a StackOverflowError
23:00:26.273 [ZScheduler-Worker-9] DEBUG foo.bar.Main.run -- timestamp=2024-02-22T23:00:26.271057+01:00 level=DEBUG thread=zio-fiber-5 message="Fiber zio-fiber-5 did not handle an error" cause=
java.lang.StackOverflowError: null
The stack points directly to the derived Competition class json codec.
Class and codec:
case class Competition(
competition_id: Option[Int],
season_id: Option[Int],
country_name: Option[String],
competition_name: Option[String],
competition_gender: Option[String],
competition_youth: Option[Boolean],
competition_international: Option[Boolean],
season_name: Option[String],
match_updated: Option[String],
match_available: Option[String]
)
object Competition {
implicit val decoder: JsonDecoder[Competition] = DeriveJsonDecoder.gen[Competition]
}
ZIO-json version:
0.6.2
Ok so this was a weird one. I was running all the code inside one file, in a Main.scala file.
The stackoverflow error dissapears if I define the Competition class and decoder outside of the main object.
Only this remains, but the result seems to be fine:
20:10:09.146 [zio-default-blocking-2] DEBUG zio.json.JsonDecoderPlatformSpecific -- timestamp=2024-02-23T20:10:09.144538+01:00 level=DEBUG thread=zio-fiber-7 message="Fiber zio-fiber-7 did not handle an error" cause=
zio.json.internal.UnexpectedEnd: if you see this a dev made a mistake using OneCharReader
/bounty $100
💎 $100 bounty • ZIO
Steps to solve:
- Start working: Comment
/attempt #1071
with your implementation plan - Submit work: Create a pull request including
/claim #1071
in the PR body to claim the bounty - Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts
Thank you for contributing to zio/zio-json!
@SimunKaracic , I cannot reproduce this error, either in tests or in the main class as you mentioned. Could you share the whole Main.scala file by any chance, as well as all relevant environment details regarding your platform, scala version, and zio version (thanks for mentioning the zio json version!).
I am also not able to reproduce the bug anymore, as I threw away the original exploratory code.
I guess it would still be nice if we had something like this in zio-json, to support loading JSON arrays from files:
def readJsonArrayAs[T: JsonDecoder](path: Path): ZStream[Any, Throwable, T] = {
ZStream
.fromPath(path)
.via(
ZPipeline.utf8Decode >>>
stringToChars >>>
JsonDecoder[T].decodeJsonPipeline(JsonStreamDelimiter.Array)
)
}
My attempt at reproducing the bug (also tried lowering versions of scala, but it didn't help):
build.sbt
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / scalaVersion := "3.4.2"
lazy val root = (project in file("."))
.settings(
name := "zio-json-reproduce",
libraryDependencies ++= Seq(
"dev.zio" %% "zio" % "2.1.5",
"dev.zio" %% "zio-json" % "0.6.2"
)
)
Main.scala:
import zio.*
import zio.json.*
import zio.stream.*
import java.nio.file.{Path, Paths}
object Main extends ZIOAppDefault {
case class Competition(
competition_id: Int,
season_id: Int,
country_name: String,
competition_name: String,
competition_gender: String,
competition_youth: Boolean,
competition_international: Boolean,
season_name: String,
match_updated: Option[String],
match_available: String
)
object Competition {
implicit val decoder: JsonDecoder[Competition] = DeriveJsonDecoder.gen[Competition]
}
private def stringToChars: ZPipeline[Any, Nothing, String, Char] =
ZPipeline.mapChunks[String, Char](_.flatMap(_.toCharArray))
val path = "competitions.json"
val loadsWholeFileIntoArray: ZIO[Any, Throwable, Long] = json.readJsonAs(path)
.runCount
def readJsonArrayAs[T: JsonDecoder](path: Path): ZStream[Any, Throwable, T] = {
ZStream
.fromPath(path)
.via(
ZPipeline.utf8Decode >>>
stringToChars >>>
JsonDecoder[T].decodeJsonPipeline(JsonStreamDelimiter.Array)
)
}
val iteratesThroughArrayOneByOne = readJsonArrayAs(Paths.get(path)).runCount
override def run: ZIO[Any with ZIOAppArgs with Scope, Any, Any] = {
loadsWholeFileIntoArray.flatMap { c =>
ZIO.logInfo(s"Array count: ${c}")
} *> iteratesThroughArrayOneByOne.flatMap { c => ZIO.logInfo(s"Items inside array count: ${c}") }
}
}