exasol/parquet-io-java

Possible bug when reading LIST with another list type

Closed this issue · 2 comments

java.lang.ClassCastException: Expected instance of primitive converter but got "com.exasol.parquetio.reader.converter.RepeatedGroupConverter"
org.apache.parquet.io.api.Converter.asPrimitiveConverter(Converter.java:30)
org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:270)
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:177)
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
com.exasol.parquetio.reader.RowParquetChunkReader$RowIterator.loadNext(RowParquetChunkReader.java:199)
com.exasol.parquetio.reader.RowParquetChunkReader$RowIterator.<init>(RowParquetChunkReader.java:165)
com.exasol.parquetio.reader.RowParquetChunkReader.iterator(RowParquetChunkReader.java:109)
com.exasol.adapter.document.documentfetcher.files.parquet.ParquetDocumentFetcher.readDocuments(ParquetDocumentFetcher.java:38)
com.exasol.adapter.document.documentfetcher.files.FilesDocumentFetcher.readLoadedFile(FilesDocumentFetcher.java:61)
com.exasol.adapter.document.iterators.FlatMapIterator.loadNext(FlatMapIterator.java:45)
com.exasol.adapter.document.iterators.FlatMapIterator.<init>(FlatMapIterator.java:30)
com.exasol.adapter.document.documentfetcher.files.FilesDocumentFetcher.run(FilesDocumentFetcher.java:56)
com.exasol.adapter.document.DataProcessingPipeline.run(DataProcessingPipeline.java:36)
com.exasol.adapter.document.GenericUdfCallHandler.run(GenericUdfCallHandler.java:97)
com.exasol.adapter.document.UdfEntryPoint.run(UdfEntryPoint.java:29)
com.exasol.ExaWrapper.run(ExaWrapper.java:197)
 (Session: 1735602946942500864)

The issue was problematic Parquet array definition, for example:

message retail {
... 
  optional group Products (LIST) {
    repeated group array {
      optional binary id (STRING);
      optional int32 ts;
      optional binary product_name (STRING);
...
}

From the documentation:

  • 2. If the repeated field is a group with multiple fields, then its type is the element type and elements are required

The array with multiple fields, each element should be required instead of optional.

However, still the reader does not support the below following way of array definition which is backward compatible:

message retail {
...
  optional group Products (LIST) {
    repeated group array {
      required binary id (STRING);
      required int32 ts;
      required binary product_name (STRING);
...
}

Therefore, I keep this issue open and change the type to feature.

I am going to close this issue, since it was one case of backward compatibility support. We can re-open it if there are similar requests in the future.
cc: @ckunki