Add RowBinary format for SELECT operations
slvrtrn opened this issue · 2 comments
Work in progress: #257
If you have any feedback on the following, please don't hesitate to comment.
Streaming will be supported out of the box by design.
Target supported ClickHouse data types with SELECT operations for the initial release; certain types are already completed in the branch:
- Boolean
- (U)Int8-32 as
number
- (U)Int64-256 as a native
BigInt
- Float32, Float64 as a
number
- String as a
string
- Nullable, LowCardinality
- Date, Date32 as either a
JS Date
, astring
, or UNIX seconds as anumber
/BigInt
by default (needs the decision to be consistent with DateTime family types), with a custom mapper support. - DateTime, DateTime64 - with a custom mapper support. DateTime could be either an ISO format
string
orBigInt
UNIX seconds. DateTime64 should be either astring
or UNIX nanoseconds as aBigInt
. - Enum - needs a decision. Maybe as a string by default, with a custom mapper (index) -> T
- UUID as a
string
- FixedString(N) as a
string
- IPv4/v6 as a
string
- Decimal as
string
by default, support custom mapper - Array(T) - should use custom mappers provided for the value type
- Map(K, V) - should use custom mappers provided for K and V types
- Tuple(T1, T2, ...) - should use custom mappers provided for T* types
- Geo types - likely, should use
[number, number]
for Point as the base for the remaining Geo types.
The types that may be included as well, depending on the format complexity:
- Variant(T1, T2, ...) - should use custom mappers provided for T* types.
- SimpleAggregateFunction - could be trivial; needs clarification.
- Nested - needs clarification. Could be tricky.
- JSON - maybe not, as it is still experimental; see also this guide about working with JSON in ClickHouse and the Semi-structured columns RFC.
Related issues:
- #25 - branched out of that one
- #212 - feature request, as RowBinary could provide performance benefits
- #215 - somewhat related; that one was about INSERT operations. However, RowBinary implementation allows us to implement data mappers for specific data types for both SELECT and INSERT; these types are Date, Date32, DateTime, DateTime64, and Decimal (taking Array or Map values into consideration, too), as the user might want to get Luxon/Decimal.js/etc objects right away.
@slvrtrn Thanks for working on this! I'm watching this topic closely since it'll be useful for the work I do. My workload also contains Arrays and Maps (to better cope with semi-hierarchical data) and even some Nested types.
Guess I'll wait until Arrays and Maps are supported before jumping in. I also consider moving away from Nested, since it tends to make things more complicated and support for de-/serializing is often lacking / coming last.
Often I can use Maps to replace Nested data.
Again, thanks for working on this. I plan at least to support in testing once progress is enough for my usecases.
Clickhouse and this Client are both very awesome! 🥇
BTW I'm using both INSERTs and SELECTs.
@cjk, thanks for your feedback. I haven't looked into Nested yet, as the core part was to get the numeric types right first, but I believe it will be one of the trickier data types to implement there.
If you could provide a sample dataset for benchmarks or just table DDLs (I will just generate the random data there in the right shape), that would be great.