Parquet TODO
nevillelyh opened this issue · 2 comments
nevillelyh commented
- Avro array support in
AvroWriteSupport
- oldTwoLevelListWriter
vs newThreeLevelListWriter
- Avro nullabe arrays and arrays of nullables
- Fix
parquet.avro.data.supplier
with generic records in test #278 - Schema compatibility check in
ReadSupport
2aea4e8 - Schema evolution for enums #290
- Schema evolution for arrays 6c00ecb
nevillelyh commented
Turns out the new 3 level list is more complex.
With the default 2 level list, myField: List[T]
is written as:
required group myField (LIST) {
repeated T array;
}
But the Avro counter part is still "name": "myField", "type": "array", "items": T
While with 3 level list, the Parquet schema becomes:
required group myField (LIST) {
repeated group list {
required T element;
}
}
And the Avro record becomes [{"element": t1}, {"element": t1}]
...
WIP in https://github.com/spotify/magnolify/tree/neville/pq-avro
nevillelyh commented
More on Avro array
mapping. The following Avro fields
{"name": "field1", "type:" {"type": "array", "items": "string"}, "default": [] } // required array field that defaults to empty array
{"name": "field2", "type:" ["null", {"type": "array", "items": "string"}], "default": null } // nullable array field that defaults to null
map to:
required group field1 (LIST) {
repeated binary array (STRING);
}
optional group field2 (LIST) {
repeated binary array (STRING);
}