Polarhouse connects together
- Polars Dataframes (backed by the Apache Arrow columnar format)
- the Clickhouse columnar database.
More specifically, it allows:
- inserting Polars Dataframes into Clickhouse tables (and creating these if necessary).
- and vice-versa retrieving Clickhouse query results as Polars Dataframes.
Communication with Clickhouse is made through the [klickhouse] crate.
Polars
┌──────────┬─────────┬──────┬───────────────────────────┐
│ name ┆ is_rich ┆ age ┆ address │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ u8 ┆ i32 ┆ struct[2] │
╞══════════╪═════════╪══════╪═══════════════════════════╡
│ Batman ┆ 1 ┆ 30 ┆ {{"Chicago","IL"},"USA"} │
│ Superman ┆ null ┆ null ┆ {{"New York","NY"},"USA"} │
└──────────┴─────────┴──────┴───────────────────────────┘
Clickhouse
┌─name─────┬─is_rich─┬──age─┬─address.city.city─┬─address.city.state─┬─address.country─┐
│ Batman │ true │ 30 │ Chicago │ IL │ USA │
│ Superman │ null │ null │ New York │ NY │ USA │
└──────────┴─────────┴──────┴───────────────────┴────────────────────┴─────────────────┘
let ch = klickhouse::Client::connect("localhost:9000", Default::default()).await?;
let df: DataFrame = ...
// Deduce table schema from the dataframe
let table = polarhouse::ClickhouseTable::from_polars_schema(table_name, df.schema(), [])?;
// Create Clickhouse table corresponding to the Dataframe (optional)
table.create(&ch, TableCreateOptions { primary_keys: &["name"] , ..Default::default() }).await?;
// Insert dataframe contents into table
table.insert_df(df, &ch).await?;
let ch = klickhouse::Client::connect("localhost:9000", Default::default()).await?;
// Retrieve Clickhouse query results as a Dataframe.
let df: DataFrame = polarhouse::get_df_query(
klickhouse::SelectBuilder::new(table_name).select("*"),
Default::default(),
&ch,
).await?;
This is for now only a proof of concept.
An alternative solution would be to write an Arrow Database Connectivity driver for Clickhouse, and use Polars' ADBC support.
$ docker run --network host --rm --name clickhouse clickhouse/clickhouse-server:latest
$ cargo nextest run -r --nocapture
- Integers
- Floating points
- Strings
- Booleans
- Categorical (Polars) / Low cardinality (Clickhouse)
- Structs (Polars), which get flattened into Clickhouse, with fields names separated by
.
- Nullables
- Lists (Polars) / Arrays (Clickhouse)
- UUIDs (mapped to Strings in Polars)
- Arrays (Polars)
- DateTime
- Time
- Duration
- ...