[avro-fastserde] Cached fast (de)serializers are not updated after setting schema
maciejkowalczyk opened this issue · 0 comments
maciejkowalczyk commented
When writer schema is not known at the time of FastSpecificDatumReader
creation, we pass null
as writerSchema
constructor parameter.
Then, even after setting proper writer schema usingsetSchema()
, we get an NPE during read()
:
java.lang.NullPointerException: Cannot invoke "org.apache.avro.Schema.equals(Object)" because "writer" is null
at org.apache.avro.Schema.applyAliases(Schema.java:1832)
at org.apache.avro.generic.GenericDatumReader.getResolver(GenericDatumReader.java:131)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at com.linkedin.avro.fastserde.FastSerdeCache$FastDeserializerWithAvroSpecificImpl.deserialize(FastSerdeCache.java:543)
at com.linkedin.avro.fastserde.FastGenericDatumReader.read(FastGenericDatumReader.java:89)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:251)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
(...)
Reproduction test:
final Schema writerSchema = Schema.create(Schema.Type.LONG);
final DataFileWriter<Long> writer = new DataFileWriter<>(new FastGenericDatumWriter<>(writerSchema));
final ByteArrayOutputStream byos = new ByteArrayOutputStream();
writer.create(writerSchema, byos);
writer.append(12345L);
writer.close();
final Schema readerSchema = Schema.create(Schema.Type.LONG);
final FastGenericDatumReader<Long> datumReader = new FastGenericDatumReader<>(null, readerSchema);
final DataFileReader<Long> reader = new DataFileReader<>(new SeekableByteArrayInput(byos.toByteArray()),
//this updates datumReader.writerSchema based on metadata in the data file
datumReader);
reader.next();