linkedin/avro-util

[avro-fastserde] Cached fast (de)serializers are not updated after setting schema

maciejkowalczyk opened this issue · 0 comments

When writer schema is not known at the time of FastSpecificDatumReader creation, we pass null as writerSchema constructor parameter.
Then, even after setting proper writer schema usingsetSchema(), we get an NPE during read():

java.lang.NullPointerException: Cannot invoke "org.apache.avro.Schema.equals(Object)" because "writer" is null
	at org.apache.avro.Schema.applyAliases(Schema.java:1832)
	at org.apache.avro.generic.GenericDatumReader.getResolver(GenericDatumReader.java:131)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
	at com.linkedin.avro.fastserde.FastSerdeCache$FastDeserializerWithAvroSpecificImpl.deserialize(FastSerdeCache.java:543)
	at com.linkedin.avro.fastserde.FastGenericDatumReader.read(FastGenericDatumReader.java:89)
	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:251)
	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
	(...)

Reproduction test:

        final Schema writerSchema = Schema.create(Schema.Type.LONG);
        final DataFileWriter<Long> writer = new DataFileWriter<>(new FastGenericDatumWriter<>(writerSchema));
        final ByteArrayOutputStream byos = new ByteArrayOutputStream();
        writer.create(writerSchema, byos);
        writer.append(12345L);
        writer.close();

        final Schema readerSchema = Schema.create(Schema.Type.LONG);
        final FastGenericDatumReader<Long> datumReader = new FastGenericDatumReader<>(null, readerSchema);
        final DataFileReader<Long> reader = new DataFileReader<>(new SeekableByteArrayInput(byos.toByteArray()),
                //this updates datumReader.writerSchema based on metadata in the data file
                datumReader);
        reader.next();