hrbrmstr/sergeant

BIGINT column shows up as col_integer()

bhaskarvk opened this issue · 3 comments

I am not sure if this is a bug or not yet. But I have a parquet table with columns of datatype BIGINT.
When querying using drill_connection(), they show up as col_integer(). I am not sure what the range of col_integer is, but if it's indeed 4 bytes then there will be loss of data when querying BIGINT columns where value exceeds integers min/max values.

OK there's some internal cleverness going on. Looks like the datatype is determined at query time based on the actual values being returned. e.g.

When the returned rows does contain a value > Max(Int) the column is of type double, but if the returned values are within Int limits then column type is integer.

>drill_query(dc,'select max(src_bytes) as max_src_bytes from dfs.data.`counts/1970/03/01`;')
  |======================================================================| 100%
Parsed with column specification:
cols(
  max_src_bytes = col_double()
)
# A tibble: 1 x 1
  max_src_bytes
*         <dbl>
1  681542898424

>drill_query(dc,'select src_bytes from dfs.lanl_parquet.`counts/1970` limit 5;')
  |======================================================================| 100%
Parsed with column specification:
cols(
  src_bytes = col_integer()
)
# A tibble: 5 x 1
  src_bytes
*     <int>
1       576
2       138
3       654
4 810104048
5   4176466

>

ODBC driver interface to Drill handles this appropriately.