Possible bug when reading LIST with another list type
Closed this issue · 2 comments
morazow commented
java.lang.ClassCastException: Expected instance of primitive converter but got "com.exasol.parquetio.reader.converter.RepeatedGroupConverter"
org.apache.parquet.io.api.Converter.asPrimitiveConverter(Converter.java:30)
org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:270)
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:177)
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
com.exasol.parquetio.reader.RowParquetChunkReader$RowIterator.loadNext(RowParquetChunkReader.java:199)
com.exasol.parquetio.reader.RowParquetChunkReader$RowIterator.<init>(RowParquetChunkReader.java:165)
com.exasol.parquetio.reader.RowParquetChunkReader.iterator(RowParquetChunkReader.java:109)
com.exasol.adapter.document.documentfetcher.files.parquet.ParquetDocumentFetcher.readDocuments(ParquetDocumentFetcher.java:38)
com.exasol.adapter.document.documentfetcher.files.FilesDocumentFetcher.readLoadedFile(FilesDocumentFetcher.java:61)
com.exasol.adapter.document.iterators.FlatMapIterator.loadNext(FlatMapIterator.java:45)
com.exasol.adapter.document.iterators.FlatMapIterator.<init>(FlatMapIterator.java:30)
com.exasol.adapter.document.documentfetcher.files.FilesDocumentFetcher.run(FilesDocumentFetcher.java:56)
com.exasol.adapter.document.DataProcessingPipeline.run(DataProcessingPipeline.java:36)
com.exasol.adapter.document.GenericUdfCallHandler.run(GenericUdfCallHandler.java:97)
com.exasol.adapter.document.UdfEntryPoint.run(UdfEntryPoint.java:29)
com.exasol.ExaWrapper.run(ExaWrapper.java:197)
(Session: 1735602946942500864)
morazow commented
The issue was problematic Parquet array definition, for example:
message retail {
...
optional group Products (LIST) {
repeated group array {
optional binary id (STRING);
optional int32 ts;
optional binary product_name (STRING);
...
}
From the documentation:
- 2. If the repeated field is a group with multiple fields, then its type is the element type and elements are required
The array with multiple fields, each element should be required instead of optional.
However, still the reader does not support the below following way of array definition which is backward compatible:
message retail {
...
optional group Products (LIST) {
repeated group array {
required binary id (STRING);
required int32 ts;
required binary product_name (STRING);
...
}
Therefore, I keep this issue open and change the type to feature.