spotify/scio

Support projections in ParquetAvroFileOperations/ParquetAvroSortedBucketIO

clairemcginty opened this issue · 1 comments

ParquetAvroFileOperations always overrides the "projection" option to equal the full reflected schema, so you can't supply a projection for a SpecificRecord class:

AvroReadSupport.setAvroReadSchema(configuration, schema);
AvroReadSupport.setRequestedProjection(configuration, schema);

#5083 provides a workaround for this via the Configuration parameter:

val projection: Schema = ...
val configuration = ParquetConfiguration.empty()
AvroReadSupport.setRequestedProjection(configuration, projection)

val read = ParquetAvroSortedBucketIO
  .read(tupleTag, classOf[TestRecord])
  .from(...)
  .withConfiguration(configuration)

In 0.14 we can add projection as a Builder method to ParquetAvroSortedBucketIO