ParquetType throws error writing Optional empty list
clairemcginty opened this issue · 1 comments
clairemcginty commented
ParquetType can write empty lists when the list field is top-level, but not when it's wrapped in an Option
. Repro:
$ sbt parquet/test:console
scala> import magnolify.parquet._
scala> import magnolify.parquet.ParquetArray.AvroCompat._
scala> case class RegularList(f: List[Int])
scala> case class OptionalList(f: Option[List[Int]])
scala> val writerRegularList = ParquetType[RegularList].writeBuilder(new TestOutputFile()).build()
scala> val writerOptionalList = ParquetType[OptionalList].writeBuilder(new TestOutputFile()).build()
// Succeeds
scala> writerRegularList.write( RegularList(List()))
// Throws error
scala> writerOptionalList.write(OptionalList(Some(List()))
org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead
at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:329)
at org.apache.parquet.io.RecordConsumerLoggingWrapper.endField(RecordConsumerLoggingWrapper.java:162)
at magnolify.parquet.ParquetField$$anon$9.write(ParquetField.scala:335)
at magnolify.parquet.ParquetField.writeGroup(ParquetField.scala:58)
at magnolify.parquet.ParquetField.writeGroup$(ParquetField.scala:54)
at magnolify.parquet.ParquetField$$anon$9.writeGroup(ParquetField.scala:309)
at magnolify.parquet.ParquetField$$anon$7.$anonfun$write$2(ParquetField.scala:290)
at magnolify.parquet.ParquetField$$anon$7.$anonfun$write$2$adapted(ParquetField.scala:290)
at scala.Option.foreach(Option.scala:437)
at magnolify.parquet.ParquetField$$anon$7.write(ParquetField.scala:290)
at magnolify.parquet.ParquetField$$anon$7.write(ParquetField.scala:280)
at magnolify.parquet.ParquetField.writeGroup(ParquetField.scala:58)
at magnolify.parquet.ParquetField.writeGroup$(ParquetField.scala:54)
at magnolify.parquet.ParquetField$$anon$7.writeGroup(ParquetField.scala:280)
at magnolify.parquet.ParquetField$$anon$3.$anonfun$write$1(ParquetField.scala:134)
at magnolify.parquet.ParquetField$$anon$3.$anonfun$write$1$adapted(ParquetField.scala:129)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
at magnolify.parquet.ParquetField$$anon$3.write(ParquetField.scala:129)
at magnolify.parquet.ParquetType$$anon$1.write(ParquetType.scala:99)
at magnolify.parquet.ParquetType$WriteSupport.write(ParquetType.scala:203)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
... 59 elided
On the user side, the easiest solution is probably dropping the Option wrapper around the List, but worth taking a look at in Magnolify imo.
RustedBones commented
Strange, I thought there was a check when transforming the field presence (from expected to REPEATED
or OPTIONAL
) and that it would throw when changing presence of an already REPEATED
or OPTIONAL
field here.
Looks the schema construction is not called in your case, but the intent was to not support such classes.