vmware-archive/hillview

Duration column should be handled by ParquetFileWriter

bin-wang opened this issue · 0 comments

Currently the ParquetFileWriter cannot handle table with parquet columns. Due to the following reasons

  1. Parquet has a Interval logical type which has the same meaning of Hillview Duration, but its format is too convoluted. It's essentially 3 int32 numbers representing months, days, and milliseconds. If we only use the milliseconds field there might be a precision loss.
  2. If we save the Duration column as Double. Then currently we cannot guarantee reading back a saved table yields the same format as the original table.

One possible solution might be save as Double but also save a schema file.