pacman82/odbc2parquet

Physical Type of Decimals

bkief opened this issue · 6 comments

bkief commented

My Rust is well "rusty", but am I correct in understanding that all ODBC Decimal Types convert to FIXED_LEN_BYTE_ARRAY. I have a Oracle NUMBER(5,3) and I was surprised its physical_type was not int32

impl ColumnFetchStrategy for Decimal {

bkief commented

Would it be as simple as changing this line

DataType::Decimal {
scale: 0,
precision: p @ 0..=9,
}
| DataType::Numeric {
scale: 0,
precision: p @ 0..=9,
} => fetch_decimal_as_identical_with_precision::<Int32Type>(is_optional, p as i32),

to scale: p @ 0..=9,

Hello @bkief,

My Rust is well "rusty", but am I correct in understanding that all ODBC Decimal Types convert to FIXED_LEN_BYTE_ARRAY. I have a Oracle NUMBER(5,3) and I was surprised its physical_type was not int32

Yes, you are correct. All Decimals with a scale different from 0 are currently represented as FIXED_LEN_BYTE_ARRAY. The binary representation in there is a twos_complement and not what some might suspect, their text representation.

Would it be as simple as changing this line

No, because ODBC would not represent the DECIMAL as an integer. identical in that context means that the ODBC and Parquet representation of the type in question are identical and can be copied without any transformation. Yet the most reliable way to get decimals out of ODBC is to transfer them in their text representation. So some conversion must always happen, but for decimals with scale 0 which ODBC will happily convert into integers for me.

Cheers, Markus

Would it be better for you if the physical type were i32? If so, how?

bkief commented

I was hoping to use the delta_bitpacked encoding. I have time-series data that lends itself very well to that format.

Thanks, I know understand much better. Could you think of any reason why someone might not want to have the decimal backed by a Physical integer type?

Hello @bkief ,

odbc2parquet 0.10.0 is released. The behaviour changes in terms of physical types are as follows:

precision 0 to 9 (inclusive) => INT32
precision 10 to 18 (inclusive) => INT64
precision 18 to 38 (inclusive) => FIXED_LEN_BYTE_ARRAY

Cheers, Markus