spotify/magnolify

ParquetType throws error writing Optional empty list

clairemcginty opened this issue · 1 comments

ParquetType can write empty lists when the list field is top-level, but not when it's wrapped in an Option. Repro:

$ sbt parquet/test:console
scala> import magnolify.parquet._
scala> import magnolify.parquet.ParquetArray.AvroCompat._

scala> case class RegularList(f: List[Int])
scala> case class OptionalList(f: Option[List[Int]])

scala> val writerRegularList = ParquetType[RegularList].writeBuilder(new TestOutputFile()).build()
scala> val writerOptionalList = ParquetType[OptionalList].writeBuilder(new TestOutputFile()).build()

// Succeeds
scala> writerRegularList.write( RegularList(List()))

// Throws error
scala> writerOptionalList.write(OptionalList(Some(List()))
org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead
  at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:329)
  at org.apache.parquet.io.RecordConsumerLoggingWrapper.endField(RecordConsumerLoggingWrapper.java:162)
  at magnolify.parquet.ParquetField$$anon$9.write(ParquetField.scala:335)
  at magnolify.parquet.ParquetField.writeGroup(ParquetField.scala:58)
  at magnolify.parquet.ParquetField.writeGroup$(ParquetField.scala:54)
  at magnolify.parquet.ParquetField$$anon$9.writeGroup(ParquetField.scala:309)
  at magnolify.parquet.ParquetField$$anon$7.$anonfun$write$2(ParquetField.scala:290)
  at magnolify.parquet.ParquetField$$anon$7.$anonfun$write$2$adapted(ParquetField.scala:290)
  at scala.Option.foreach(Option.scala:437)
  at magnolify.parquet.ParquetField$$anon$7.write(ParquetField.scala:290)
  at magnolify.parquet.ParquetField$$anon$7.write(ParquetField.scala:280)
  at magnolify.parquet.ParquetField.writeGroup(ParquetField.scala:58)
  at magnolify.parquet.ParquetField.writeGroup$(ParquetField.scala:54)
  at magnolify.parquet.ParquetField$$anon$7.writeGroup(ParquetField.scala:280)
  at magnolify.parquet.ParquetField$$anon$3.$anonfun$write$1(ParquetField.scala:134)
  at magnolify.parquet.ParquetField$$anon$3.$anonfun$write$1$adapted(ParquetField.scala:129)
  at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
  at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
  at magnolify.parquet.ParquetField$$anon$3.write(ParquetField.scala:129)
  at magnolify.parquet.ParquetType$$anon$1.write(ParquetType.scala:99)
  at magnolify.parquet.ParquetType$WriteSupport.write(ParquetType.scala:203)
  at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
  at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
  ... 59 elided

On the user side, the easiest solution is probably dropping the Option wrapper around the List, but worth taking a look at in Magnolify imo.

Strange, I thought there was a check when transforming the field presence (from expected to REPEATED or OPTIONAL) and that it would throw when changing presence of an already REPEATED or OPTIONAL field here.

Looks the schema construction is not called in your case, but the intent was to not support such classes.