IMSMWU/RClickhouse

Add support for bool data type

Closed this issue · 7 comments

R code:

> con <- DBI::dbConnect(RClickhouse::clickhouse(), host="ch-12", db="pivot2")
> tbl(con, "dashboard") %>% filter(date==as.Date('2022-12-01'))
Error in fetch(res@ptr, n) : cannot read unsupported type: Void

Here is the corresponding Clickhouse table schema:

CREATE TABLE pivot2.dashboard
(
    `date` Date,
    `_orig_part` LowCardinality(String),
    `what` LowCardinality(String),
    `hour` UInt8,
    `dsp_id` UInt32,
    `endpoint_id` UInt32,
    `line_item_id` UInt32,
    `external_creative_id` UInt32,
    `ssp` LowCardinality(String),
    `imp_type` LowCardinality(String),
    `names` LowCardinality(String),
    `os` LowCardinality(String),
    `channel_type` LowCardinality(String),
    `country_id` UInt8,
    `state_id` UInt16 CODEC(T64, LZ4),
    `bundle` LowCardinality(String),
    `is_rewarded` UInt8,
    `api` Array(Int8),
    `count` SimpleAggregateFunction(sum, UInt64),
    `win_price_dsp` SimpleAggregateFunction(sum, UInt64),
    `external_bid_price_dsp` SimpleAggregateFunction(sum, UInt64)
)
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/ssd/pivot2/{shard}/dashboard3', '{replica}')
PARTITION BY toYYYYMM(date)
PRIMARY KEY (_orig_part, date, what, ssp, imp_type, dsp_id, channel_type, country_id, endpoint_id, names)
ORDER BY (_orig_part, date, what, ssp, imp_type, dsp_id, channel_type, country_id, endpoint_id, names, os, state_id, is_rewarded, api, hour, line_item_id, bundle, external_creative_id)
TTL date + toIntervalYear(1)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1

Additionally:

> DBI::dbGetQuery(con, "SELECT * FROM dashboard where date=today() limit 1")

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: select(conn@ptr, statement)
 2: dbSendQuery(conn, statement, ...)
 3: dbSendQuery(conn, statement, ...)
 4: .local(conn, statement, ...)
 5: DBI::dbGetQuery(con, "SELECT * FROM dashboard where date=today() limit 1")
 6: DBI::dbGetQuery(con, "SELECT * FROM dashboard where date=today() limit 1")

Thanks for reaching out. I don't have a ReplicatedMergeTree in use and, therefore, cannot easily test it. Could you please create a new plain MergeTree table with a small subset of the rows and query it with the RClickhouse package? If it works, the issue is related to the nature of Replicated or Aggregating MergeTrees, and then it is easier for us to debug further.

I just got this error. Minimal reproduction:

> DBI::dbGetQuery(con, "SELECT TRUE;")
Error in fetch(res@ptr, n) : cannot read unsupported type: Void

Thank you for reaching out, and please excuse the late reply. Void-Results are not supported, and I don't see a (timely) need for that. Does the query work if you remove the semicolon? A short workaround would be to use dbSendQuery instead, because it does not return results.

Is TRUE really void? To me looks more like a bool.
Removing the semicolon doesn't work.

> df <- DBI::dbGetQuery(con, "SELECT TRUE")
Error in fetch(res@ptr, n) : cannot read unsupported type: Void

But if I select an int, dbGetQuery works just fine.

> df <- DBI::dbGetQuery(con, "SELECT 1")
> df
  1
1 1

Okay, I thought it was related to the multi-sql statement (the semicolon at the end). However, it is associated with the data type. In the "good old days," Clickhouse used an integer to represent bools, but they changed that (see ClickHouse/ClickHouse#36294). So, we would have to add support for bool. I'll keep the issue open for now. A quick hack would be to convert bools to integers. I currently can't do that, but feel free to provide a pull request.

see #94