IMSMWU/RClickhouse

test/enhance join support

Closed this issue · 2 comments

dplyr and clickhouse support joins, but click house does not use the standard SQL syntax. Therefore, it might be the case, that we have to fix this.

Clickhouse Join syntax: [GLOBAL] ANY|ALL INNER|LEFT [OUTER] JOIN (subquery)|table USING columns_list.

SQL Join syntax: [[FULL|LEFT|RIGHT [OUTER]]|INNER] JOIN (subquery)|table ON columns_list |USING columns_list

  • Clickhouse supports special ANY/ALL settings - would be nice if we allow the user to use this by adding a new parameter to the method's signature.
  • write tests
  • rewrite select part. seems like clickhouse does not support table aliases (but subquery aliases). sql_join.ClickhouseConnection should not add aliases for the first table (LEFT_TABLE). So we have to rewrite it that dplyr generates two subqueries and then add aliases (ClickHouse/ClickHouse#528)
  • click house does not support ON syntax, but dplyr: SQL joins always use the ON ... syntax, avoiding USING ... even for natural joins. Improved handling of tables with columns of the same name (#1997, @javierluraschi). They now generate SQL more similar to what you'd write by hand, eliminating a layer or two of subqueries (#2333)
  • Using ALL corresponds to the normal JOIN semantic from standard SQL: add all keyword to all joins per default

RIGHT and FULL joins are actually supported. But RIGHT join is not equivalent to the flipped LEFT join - you still want your "right" table to be small to fit in memory. Also, there is a bug in the implementation that is fixed in master, but not yet released.