apache/parquet-java

AvroSchemaConverter toString() conversion when schema has multiple INT96 fields

Opened this issue · 0 comments

Describe the bug, including details regarding any error messages, version, and platform.

When trying to convert toString an avroSchema which has multiple fields of INT96 type, the schema string is not correct.

The issue can be reproduced with the following code -

Configuration configuration = new Configuration();
configuration.setBoolean(READ_INT96_AS_FIXED, true);
Types.MessageTypeBuilder builder = buildMessage();
builder.optional(PrimitiveType.PrimitiveTypeName.INT96).named("timestamp_1");
builder.optional(PrimitiveType.PrimitiveTypeName.INT96).named("timestamp_2");
MessageType int96Schema = builder.named("int96Schema");
Schema avroSchema = new AvroSchemaConverter(configuration).convert(int96Schema);
System.out.println(avroSchema.toString(true));

It produces the following schema string -

{
  "type" : "record",
  "name" : "int96Schema",
  "fields" : [ {
    "name" : "timestamp_1",
    "type" : [ "null", {
      "type" : "fixed",
      "name" : "INT96",
      "doc" : "INT96 represented as byte[12]",
      "size" : 12
    } ],
    "default" : null
  }, {
    "name" : "timestamp_2",
    "type" : [ "null", "INT96" ],
    "default" : null
  } ]
}

schema for field timestamp_2 is not correct.
Should be of similar format as timestamp_1-

{
    "name" : "timestamp_2",
    "type" : [ "null", {
      "type" : "fixed",
      "name" : "INT96",
      "doc" : "INT96 represented as byte[12]",
      "size" : 12
    } ],
    "default" : null
  }

Component(s)

Avro