tidyverse/glue

glue_sql handling of empty vectors

Closed this issue ยท 5 comments

Post upgrade to glue version 1.7.0, we encountered a breaking change in how glue_sql handles empty vectors introduced on #292 . Previously, empty vectors were collapsed to "NULL" in SQL queries, which was functional for our use case. However, since the update, empty vectors are now collapsed to an empty string, resulting in syntax errors in SQL queries with IN clauses.

This change impacts all database queries where we dynamically generate IN clauses based on possibly empty vectors. The new behavior leads to malformed SQL queries and runtime errors, significantly affecting our data retrieval processes.

Could we consider reintroducing the previous behavior of collapsing empty vectors to "NULL" in SQL queries, or provide an option to toggle between the two behaviors? This would allow for backward compatibility and flexibility for different use cases.

We've implemented a workaround where we manually check for empty vectors and adjust the SQL query accordingly. However, this adds complexity and boilerplate code, which was elegantly handled by glue_sql in previous versions.

Mea culpa: I opened the original issue (#272) and now I think @jasonheffnerpsu is correct for an expanded length-zero vector of SQL literals, as with the case of:

x <- character(0)
glue_sql("select foo from bar where baz in ({x*})")

This yields select foo from bar where baz in (null) which I now believe is indeed SQL-idiomatic. (My initial error was driven by using a too-lenient analytics SQL engine, I think.)

But, when asterisk expansion is used for SQL symbols, as with:

glue_sql("select {`x`*} from bar where ...")

... there seem to be two possible options:

select from bar where ...       <-- yields a zero-col table via PostgreSQL
select null from bar where ...  <-- yields a one-col table via PostgreSQL

I think the zero-col result is correct here, since the number of columns returned (0) matches the length of x (0).

So, null seems correct for {x*} but an empty string seems correct (to me, at least) for {`x`*}.

I think the easiest fix would be to simply revert to the previous behaviour.

I am encountering a similar issue since glue 1.7.0: not for empty vectors, but for variables that are NULL. Previously (glue 1.6.0) those (here rights_holder) were interpreted as NULL:

library(glue)
library(DBI)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
rights_holder <- NULL
sql <- "SELECT {rights_holder} AS rights_holder, * FROM data"
glue(sql)
glue_sql(sql, .con = con)
#> <SQL> SELECT NULL AS rights_holder, * FROM data

Created on 2024-01-18 with reprex v2.1.0

Now (glue 1.7.0), the result of glue_sql() is an empty <SQL> class. glue() does not return anything:

library(glue)
library(DBI)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
rights_holder <- NULL
sql <- "SELECT {rights_holder} AS rights_holder, * FROM data"
glue(sql)
glue_sql(sql, .con = con)
#> <SQL>

Created on 2024-01-18 with reprex v2.1.0

I assume this will be fixed when the old behaviour is restored.

@peterdesmet I don't think so, because the R object that is best equivalent to SQL's NULL is NA:

library(glue)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
glue_sql("SELECT {NA} AS rights_holder, * FROM data", .con = con)
#> <SQL> SELECT NULL AS rights_holder, * FROM data

Created on 2024-02-29 with reprex v2.1.0

Ok, that makes sense.