dbSendQuery documentation unclear regarding when SQL request is executed vs prepared
tomcnolan opened this issue · 40 comments
Hi folks, my team at Teradata is implementing a DBI Driver for the Teradata Database.
The documentation for dbSendQuery says:
The dbSendQuery() method only submits and synchronously executes the SQL query to the database engine.
The documentation for dbBind says:
For parametrized or prepared statements, the dbSendQuery() and dbSendStatement() functions can be called with statements that contain placeholders for values. The dbBind() function binds these placeholders to actual values ...
The documentation appears to be saying that the dbSendQuery method will execute a non-parameterized SQL request, and will prepare (but not execute) a parameterized SQL request.
The documentation does not indicate how a DBI Driver should distinguish between a parameterized versus non-parameterized SQL request, in order to decide how to transmit the SQL request to the backend database in the appropriate manner (prepare versus execute).
In other database APIs, such as JDBC, there are separate API calls for prepare versus execute, so expected driver behavior is obvious.
My expectation is that the dbSendQuery documentation can be augmented to explain the decision-making criteria for whether the specified SQL request is to be prepared versus executed.
In particular, the documentation should indicate whether this behavior is driver-specific. If there is a recommended way for the application to indicate to the driver whether to prepare versus execute, such as an extra named parameter for the dbSendQuery, then the documentation should mention that.
Thanks for the feedback.
RPostgres, RMariaDB, RSQLite and most likely odbc use the same code path both for parametrized and non-parametrized SQL. The SQL will be parsed for placeholders by the database library. Would that be an option for Teradata?
I remember problems with RMariaDB where some queries could not be executed as parametrized SQL, we had to work around and revert to a different code path after failure. Perhaps you could look for params = list()
or params = NULL
in the dbSendQuery()
call?
Happy to adapt/enhance DBItest to this use case.
The SQL will be parsed for placeholders by the database library. Would that be an option for Teradata?
We would like to avoid that approach if at all possible. We try to avoid parsing SQL in our drivers.
some queries could not be executed as parametrized SQL, we had to work around and revert to a different code path after failure.
This is also true for the Teradata Database. Some kinds of SQL requests cannot be prepared. Unfortunately, this approach won't work for us, because the same error code is returned for syntax errors and errors due to a SQL request that can't be prepared.
Perhaps you could look for
params = list()
orparams = NULL
in thedbSendQuery()
call?
Yes, this approach could work. This approach would have 3 use cases:
-
dbSendQuery()
with noparams
argument -- the driver tells the database to execute the non-parameterized SQL request. -
dbSendQuery()
withparams=NULL
argument -- the driver tells the database to prepare (but not execute) the parameterized SQL request. The app must subsequently execute the prepared request withdbBind()
-
dbSendQuery()
with non-NULLparams
argument -- the driver tells the database to prepare and execute the parameterized SQL request with the specified parameter values.
Thanks. Unfortunately, the cases 1 and 2 are indistinguishable. How about:
-
dbSendQuery(params = NULL)
: Prepare but wait fordbBind()
-
dbSendQuery(params = list())
: Execute right away (possibly using a different code path) -
dbSendQuery(params = list(...))
: Prepare and bind
This would be consistent with existing implementations and, I guess, achieve what you're looking for.?
How about:
dbSendQuery(params = NULL)
: Prepare but wait fordbBind()
dbSendQuery(params = list())
: Execute right away (possibly using a different code path)dbSendQuery(params = list(...))
: Prepare and bind
Yes, that should work.
The documentation does not currently talk about the expected behavior for non-parameterized SQL requests.
Can the documentation for dbSendQuery
be augmented to discuss use case # 2 for immediately executing a non-parameterized SQL request?
dbSendQuery(params = list())
: Execute right away (possibly using a different code path)
Thanks!
Yes, we need to add test cases and implementations in the different backends. I'll consider this for the upcoming update in February.
This may be even more relevant for dbSendStatement()
.
@krlmlr I have implemented dbSendQuery
in the Teradata driver and have encountered a problem with our proposed solution.
Many other DBI methods such as dbExecute
funnel down to calling dbSendQuery
and do not explicitly specify a params
argument and implicitly let params
default to NULL
. Those methods expect the SQL request to be executed, not just prepared without being executed.
Therefore, given how other DBI methods call dbSendQuery
, the default behavior for dbSendQuery
when params
is not specified (meaning params
is NULL
) must be to execute the SQL request.
We should state generally that params
must be NULL
for a non-parameterized SQL request, and params
must be non-NULL
for a parameterized SQL request. That distinction actually makes sense, and is intuitive.
dbSendQuery
behavior must be as follows:
dbSendQuery(params = NULL)
: Execute the non-parameterized SQL request. This is the default whenparams
is not specified.dbSendQuery(params = list())
: Prepare but do not execute the parameterized SQL request. Assumes the parameterized SQL request will be executed with a subsequent call todbBind()
.dbSendQuery(params = list(...))
: Prepare and execute the parameterized SQL request with the specified bind parameter values.
Thanks!
A few loose thoughts:
dbExecute()
should forward to dbSendStatement()
-- if this is the same operation in Teradata, dbSendStatement()
can just forward to dbSendQuery()
. Do you really need to override dbExecute()
?
I skimmed the DBI specification, time of execution/query doesn't currently seem to be specified indeed. It can't be "too early", though: it's currently not mandatory for the caller to indicate whether the query contains placeholders. This is done with a call to dbBind()
-- if that call is missing, dbFetch()
or dbGetRowsAffected()
work if the query doesn't contain placeholders, and throw an error otherwise.
NULL
has the same semantics as "argument absent". You mentioned dbExecute()
where the expectation is that the statement is executed immediately -- I'd second that, but dbExecute(params = NULL)
could forward to dbSendStatement(params = list())
to achieve the desired behavior.
Agree to update the documentation once this is adopted.
dbExecute()
should forward todbSendStatement()
-- if this is the same operation in Teradata,
Yes dbExecute
should forward to dbSendStatement
Yes they are the same operation for Teradata.
dbSendStatement() can just forward to dbSendQuery().
Yes.
Do you really need to override dbExecute() ?
No, we do not need to override dbExecute
. We do not want to override dbExecute
.
We want to be able to inherit the standard dbExecute
implementation in DBI.
dbExecute(params = NULL) could forward to dbSendStatement(params = list()) to achieve the desired behavior.
It could, but it does not.
If that behavior was added in a future release of DBI, then it would produce different behavior than existing old releases of DBI.
Our goal for the Teradata driver is to be compatible with DBI >= 0.8
Therefore, we need the meaning of the params
argument to be consistent across all releases of DBI >= 0.8
dbExecute
, dbSendStatement
, and dbSendQuery
behavior all must treat the params
argument in the same manner:
-
params = NULL
: Execute the non-parameterized SQL request. This is the default whenparams
is not specified. -
params = list()
: Prepare but do not execute the parameterized SQL request. Assumes the parameterized SQL request is prepared withdbSendQuery
and will be executed with a subsequent call todbBind
. Whenparams = list()
is specified fordbExecute
ordbSendStatement
the parameterized SQL request is prepared but cannot subsequently be used withdbBind
, effectively acting only as a SQL syntax validation check. -
params = list(...)
: Prepare and execute the parameterized SQL request with the specified bind parameter values.
We have already implemented this behavior in the Teradata DBI driver, and it works well with DBI 0.8.
- We inherit the standard
dbExecute
from DBI, which forwards theparams
argument todbSendStatement
. - We inherit the standard
dbSendStatement
from DBI, which forwards theparams
argument todbSendQuery
. - We implement
dbSendQuery
and we interpret theparams
argument as listed above.
Thanks. I hear you, but the suggested treatment of params = NULL
will not work because NULL
has the same semantics as "argument absent". If no params are given, dbSendQuery()
and dbSendStatement()
still accepts a query/statement with placeholders, and will (in the general case) create a result set that waits for a dbBind()
call.
Does your implementation pass the tests in DBItest?
It would be great to achieve consensus here. Can we find a way that doesn't break current implementations of RMariaDB, RPostgres and RSQLite and achieves what you are looking for?
Compatibility with DBI 0.8 will require overriding a few methods, but can be done. Alternatively, you could require a recent version of DBI.
If you are willing to change the default implementation for DBI methods, then I recommend that the distinction between preparing and executing a parameterized SQL request be made explicit by introducing two new methods: dbExecuteStatement
and dbExecuteQuery
.
In addition, the DBI documentation must be changed to clearly articulate the behavior of each of the DBI methods with respect to parameterized versus non-parameterized SQL.
Here is a proposal that I believe is compatible with existing DBI drivers...
(Existing method, modified documentation, modified default implementation)
dbExecute(conn, statement, ...)
is intended for executing either non-parameterized or parameterized SQL requests.
dbExecute
comes with a default implementation that calls dbExecuteStatement
then dbGetRowsAffected
followed by dbClearResult
.
dbExecute
has an optional params
argument interpreted as follows:
params = NULL
: Execute the non-parameterized SQL request. This is the default whenparams
is not specified.params = list(...)
: Execute the parameterized SQL request with the specified bind parameter values.
(New method)
dbExecuteStatement(conn, statement, ...)
is intended for executing either non-parameterized or parameterized SQL requests.
dbExecuteStatement
comes with a default implementation that calls dbSendStatement
, suitable for backends that do not distinguish between prepare and execute operations.
Backends with separate prepare versus execute operations should override dbExecuteStatement
.
dbExecuteStatement
has an optional params
argument interpreted as follows:
params = NULL
: Execute the non-parameterized SQL request. This is the default whenparams
is not specified.params = list(...)
: Execute the parameterized SQL request with the specified bind parameter values.
(Existing method, modified documentation, unchanged default implementation)
dbSendStatement(conn, statement, ...)
is intended for preparing or executing parameterized SQL requests.
dbSendStatement
comes with a default implementation that calls dbSendQuery
for backends that do not distinguish between DML statements and queries.
params = NULL
: Prepare but do not execute the parameterized SQL request. This is the default whenparams
is not specified. The request can be executed with a subsequent call todbBind
.params = list(...)
: Execute the parameterized SQL request with the specified bind parameter values.
(Existing method, modified documentation, modified default implementation)
dbGetQuery(conn, statement, ...)
is intended for executing either non-parameterized or parameterized SQL requests.
dbGetQuery
comes with a default implementation that calls dbExecuteQuery
then dbFetch
followed by dbClearResult
.
dbGetQuery
has an optional params
argument interpreted as follows:
params = NULL
: Execute the non-parameterized SQL request. This is the default whenparams
is not specified.params = list(...)
: Execute the parameterized SQL request with the specified bind parameter values.
(New method)
dbExecuteQuery(conn, statement, ...)
is intended for executing either non-parameterized or parameterized SQL requests.
dbExecuteQuery
comes with a default implementation that calls dbSendQuery
, suitable for backends that do not distinguish between prepare and execute operations.
Back ends with separate prepare versus execute operations should override dbExecuteQuery
.
dbExecuteQuery
has an optional params
argument interpreted as follows:
params = NULL
: Execute the non-parameterized SQL request. This is the default whenparams
is not specified.params = list(...)
: Execute the parameterized SQL request with the specified bind parameter values.
(Existing method, modified documentation)
dbSendQuery(conn, statement, ...)
is intended for preparing or executing parameterized SQL requests.
params = NULL
: Prepare but do not execute the parameterized SQL request. This is the default whenparams
is not specified. The request can be executed with a subsequent call todbBind
.params = list(...)
: Execute the parameterized SQL request with the specified bind parameter values.
Thanks. I was contemplating new interface methods, it's a bit of work but seems worth it.
I'm not sure I follow your proposal entirely, I'll draft mine and we see if we can meet halfway.
I'm proposing two new methods, dbSendQueryNow()
and dbSendStatementNow()
.
- Can accept parameters, but guarantee immediate execution. Failure if not enough parameters supplied.
- Result set object may accept new/updated parameters through
dbBind()
, but isn't required to - Default implementation calls
dbSendXXX()
+dbBind()
(or justdbSendXXX(params = ...)
) - Default
dbGetQuery()
anddbExecute()
are modified to calldbSendXXXNow()
Is the dbExecuteXXX()
from your proposal the same as dbSendXXXNow()
from mine? I'm confused by the naming scheme and the similarity to the existing dbExecute()
.
@krlmlr Yes, your proposed methods sound similar to mine.
Your proposed dbSendQueryNow
sounds like my proposed dbExecuteQuery
.
Your proposed dbSendStatementNow
sounds like my proposed dbExecuteStatement
.
@tomcnolan: I feel that new generics are a bit too much for this problem. Did we discuss an optional argument to dbSendQuery()
? This design seems to work for {odbc}: r-dbi/odbc#272.
@krlmlr
I read r-dbi/odbc issues 163 and 127 which indicate that the "prepare versus execute" problem occurs for other drivers also.
Clearly people expect dbExecute
and dbSendStatement
to actually execute the SQL request, not just prepare it.
I feel that new generics are a bit too much for this problem.
I disagree.
DBI lacks a distinction between preparing and executing a SQL request. Other database access APIs such as ODBC and JDBC offer separate API functions for those two operations.
Did we discuss an optional argument to
dbSendQuery()
?
No, we have not discussed that approach yet.
In my opinion, adding a non-default immediate = TRUE
argument is problematic and will likely lead to behavior differences between DBI drivers.
We have implemented the following behavior in the DBI driver for Teradata:
dbExecute
, dbSendStatement
, dbGetQuery
, and dbSendQuery
all treat the params
argument in the same manner:
params = NULL
: Execute the non-parameterized SQL request. This is the default whenparams
is not specified.params = list()
: Prepare but do not execute the parameterized SQL request. Assumes the parameterized SQL request is prepared withdbSendStatement
ordbSendQuery
and will be executed with a subsequent call todbBind
. Whenparams = list()
is specified fordbExecute
ordbGetQuery
the parameterized SQL request is prepared and noDBIResult
is returned sodbBind
cannot subsequently be used, effectively acting only as a SQL syntax validation check.params = list(...)
: Prepare and execute the parameterized SQL request with the specified bind parameter values.
@tomcnolan it's not feasible to change the meaning of NULL
because it would break backward compatibility. Another option would be to use a sentinel object:
params = NULL
: current behaviour: prepare but don't execute.params = none()
: execute immediately.params = list(...)
: current behaviour: prepare, bind, then execute.
It's not clear that is better than adding new immediate
argument, although it does make it impossible to both supply parameters and execute immediately.
I think adding new generics here is too heavy.
@hadley @krlmlr People expect dbExecute
and dbGetQuery
to actually execute a non-parameterized SQL request.
People do not expect dbExecute
and dbGetQuery
to just prepare but not execute a non-parameterized SQL request.
dbExecute
and dbGetQuery
don't return a DBIResult
object, so there is no way to do a subsequent dbBind
.
It seems like a really bad workaround to require people to specify immediate = TRUE
for calls to dbExecute
and dbGetQuery
with a non-parameterized SQL request.
If you are committed to the immediate
parameter idea, then the default should be immediate = TRUE
for dbExecute
and dbGetQuery
.
@tomcnolan: dbGetQuery()
and dbExecute()
do run the query immediately (parametrized or not), it's just that the (internal) parametrized API is always used currently. I think we can adapt the DBI spec to explicitly state that dbGetQuery()
and dbExecute()
are allowed use the "immediate" internal API. This would be reflected by default values: immediate = NA
for dbGetQuery()
and dbExecute()
, which means the implementation is free to choose, immediate = FALSE
for dbSendQuery()
and dbSendStatement()
. How does that sound?
Summary
Different interfaces for prepared queries and immediate execution found in (at least) Teradata, ODBC, MSSQL and MySQL/MariaDB. The current default is to always use prepared queries. We need to offer a clear and unambiguous way to access the "immediate" API.
I agree with @hadley that we shouldn't make backward-incompatible changes at this point. I also agree with @tomcnolan that we should make it easy for backend implementers to support the new interface so that no behavioral differences occur. Tests will be one aspect of this, but these might be difficult to implement in a backend-agnostic way. I also think we need to differentiate between dbSend...()
and dbGetQuery()
/dbExecute()
, but we also want to have a consistent interface for all functions, possibly with different defaults.
The "immediate" API is optional, but important for some queries/backends. I rather like the idea of an optional immediate
argument now, this doesn't overload the params
argument.
Most users will call dbGetQuery()
or dbExecute()
to access the database. Here, the query parameters are known up front, and the implementation can make its own choice of the interface to use (immediate = NA
). The DBI spec should not constrain the backend to use a particular mode of operation here, also we cannot test it in a backend-agnostic way. In addition, the immediate
argument is a hint to specify what API should be used internally, overridable by the user.
dbSendQuery()
and dbSendStatement()
are methods that are rarely called by end users. As stated, we can't change the default behavior, but we can support a hint to call the non-parametrized API.
@krlmlr Yes, I agree with this proposal. Thanks!
Will this require a new version of DBI?
How soon can the official DBI documentation be updated to reflect this?
@krlmlr I have a follow-up question regarding this topic:
This would be reflected by default values:
immediate = NA
fordbGetQuery()
anddbExecute()
, which means the implementation is free to choose
What is the expected behavior from the driver when the app specifies immediate = FALSE
for dbGetQuery
or dbExecute
? Is the driver supposed to stop with an error?
I'll add it to the spec very soon, there's a related PR in the {odbc} package.
The current implementation for {odbc}, {RSQLite}, {RMariaDB} and {RPostgres} always uses the corresponding "prepared query" API, even for queries/statements without parameters. I'd argue that immediate = FALSE
shouldn't be an error but rather use the "prepared query" API as requested. @jimhester suggests that immediate = FALSE
should remain the (internal) default for the {odbc} package even after the change (r-dbi/odbc#272 (comment)).
Can Teradata use the "prepared query" API for queries without parameters?
For the Teradata DBI Driver, we have added the immediate
parameter for the dbExecute
, dbGetQuery
, dbSendStatement
, and dbSendQuery
methods as discussed above.
The Teradata DBI Driver will behave as follows for the dbExecute
, dbGetQuery
, dbSendStatement
, and dbSendQuery
methods:
- When bound parameter values are specified with the
params
argument, theimmediate
argument is ignored, and the SQL request is executed immediately. - When no bound parameter values are specified, and
immediate = NA
orimmediate = TRUE
is specified, then the SQL request is executed immediately.immediate = NA
is the default for thedbExecute
anddbGetQuery
methods. - When no bound parameter values are specified, and
immediate = FALSE
is specified, then the SQL request is prepared but not executed.immediate = FALSE
is the default for thedbSendStatement
anddbSendQuery
methods.
The only remaining question is the expected behavior of the dbExecute
and dbGetQuery
methods when no bound parameter values are specified and immediate = FALSE
is specified. Earlier you said:
dbGetQuery()
anddbExecute()
do run the query immediately (parametrized or not)
So it seems reasonable that dbExecute
and dbGetQuery
should stop with an error if immediate = FALSE
is specified. Please provide some guidance on this issue. Thanks!
Can Teradata use the "prepared query" API for queries without parameters?
The Teradata Database allows almost all non-parameterized SQL requests to be prepared.
Some (obviously non-parameterized) DDL commands cannot be prepared, so the Teradata DBI Driver cannot blindly attempt to prepare all SQL requests.
The Teradata DBI Driver must assume by default that an app intends to execute a SQL request, and only prepare a SQL request if it's unambiguously certain that the app intended to prepare a SQL request.
I think the following implementation sketch of dbGetQuery()
should do the job (ignoring S4 specifics):
dbGetQuery <- function(..., params = NULL, immediate = NA) {
immediate <- decide_immediate(immediate, params, ...)
res <- dbSendQuery(..., params = params, immediate = immediate)
on.exit(dbClearResult(res))
dbFetch(res, ...)
}
This doesn't differ much from the current implementation, only the immediate
argument is new. dbSendQuery()
is still supposed to return a result set object, only dbFetch()
actually returns data.
Similarly for dbExecute()
.
Ideally you shouldn't even need to override dbGetQuery()
, but you may wish to do so if it's important to always run non-parametrized queries with the "direct" API. Depending on the backend this may be the riskier option (thinking about subtle issues like data type conversion), which is why I agree that the other backends should continue to use the "prepared" API by default.
Thank you for your patience and for your very valuable input!
@krlmlr
The Teradata DBI Driver implementation of the dbGetQuery
method looks similar to your example code above; however, please note that if the app specifies immediate = FALSE
when calling dbGetQuery
then the request will only be prepared and not executed, which will cause the call to dbFetch
to fail.
We still need guidance as to the expected behavior from dbGetQuery
when no bound parameter values are specified and immediate = FALSE
is specified. Thanks!
Here is what our implementation of dbGetQuery
looks like:
setMethod ("dbGetQuery", signature ("TeradataConnection", "character"), function (conn, statement, n = -1, params = NULL, immediate = NA, ...) {
# Optional arguments params and immediate appear before dots in order to accommodate partial argument matching.
if (conn@m_bTraceLog) {
cat (paste0 ("> enter dbGetQuery ", conn, " immediate=", immediate, " ", statement, "\n"))
on.exit (cat (paste0 ("< leave dbGetQuery ", conn, " immediate=", immediate, " ", statement, "\n")))
}
res <- DBI::dbSendQuery (conn, statement, params, immediate)
tryCatch ({
return (DBI::dbFetch (res, n = n))
}, finally = {
DBI::dbClearResult (res)
}) # end finally
}) # end method dbGetQuery
What needs to be done to actually execute the request in the immediate = FALSE
case? Why would the subsequent dbFetch()
call fail?
Have you seen DBIlog, which is a generic logging wrapper for DBI connections? I'm planning to send it to CRAN soon.
The DBI documentation says that dbFetch
is supposed to stop with an error if dbFetch
is called before dbBind
. On this page https://dbi.r-dbi.org/reference/dbbind
Until dbBind() has been called, the returned result set object has the following behavior:
- dbFetch() raises an error (for dbSendQuery())
In both your version of dbGetQuery
and our version of dbGetQuery
, when no params
are specified and immediate = FALSE
is specified, the dbGetQuery
method calls dbSendQuery
to just prepare (but not execute) the SQL request and then calls the dbFetch
method without having called the dbBind
method.
The dbFetch
method is supposed to stop with an error in that situation.
I see now that the text is unclear -- it's referring only to queries that contain placeholders. For queries that don't contain placeholders, the dbFetch()
call is valid even without calling dbBind()
:
res <- dbSendQuery()
dbFetch(res)
dbClearResult(res)
We might need a basic documentation that describes these workflows, for both parametrized and non-parametrized queries.
OK, let's assume that the SQL request is parameterized. What is the expected behavior of the dbGetQuery
method when called with a parameterized SQL request, but no bound parameter values, and immediate = FALSE
?
In that situation, both your version of dbGetQuery
and our version of dbGetQuery
will fail when calling the dbFetch
method.
Yes, this is the expected behavior, regardless of the value of the immediate
argument.
library(DBI)
conn <- dbConnect(RSQLite::SQLite())
dbGetQuery(conn, "SELECT ? AS a", params = list(1))
#> a
#> 1 1
dbGetQuery(conn, "SELECT ? AS a")
#> Error: Query needs to be bound before fetching
Created on 2019-04-23 by the reprex package (v0.2.1.9000)
@krlmlr @hadley
We have discovered that dbplyr expects dbSendQuery
to immediately execute the SQL request and calls dbFetch
after dbSendQuery
-- please note function db_collect.DBIConnection
:
https://github.com/tidyverse/dbplyr/blob/master/R/verb-compute.R#L118
We suspect that other existing libraries, like dbplyr, also expect dbSendQuery
to immediately execute the SQL request and call dbFetch
after dbSendQuery
.
The Teradata DBI Driver must be able to interoperate with dbplyr and other existing libraries that expect immediate execution from dbSendQuery
, so the Teradata implementation of dbSendStatement
and dbSendQuery
will have a default argument immediate = NA
and the Teradata DBI Driver will assume immediate execution for the default immediate = NA
.
The Teradata DBI Driver will behave as follows for the dbExecute
, dbGetQuery
, dbSendStatement
, and dbSendQuery
methods:
- When bound parameter values are specified with the
params
argument, theimmediate
argument is ignored, and the SQL request is executed immediately. - Argument
immediate = NA
is the default for thedbExecute
,dbGetQuery
,dbSendStatement
, anddbSendQuery
methods. - When no bound parameter values are specified, and
immediate = NA
orimmediate = TRUE
is specified, then the SQL request is executed immediately. - When no bound parameter values are specified, and
immediate = FALSE
is specified, then the SQL request is prepared but not executed. - When no bound parameter values are specified, and
immediate = FALSE
is specified, thendbGetQuery
will fail with an error due to fetch attempt before bind.
dbFetch()
can be run immediately after dbSendQuery()
for non-parametrized queries. The "Examples" section in ?dbFetch
gives a hint, but I agree it's not very explicit. The documentation needs improvement here.
The following reprex gives an overview over the expected behavior (irrespective of the immediate
argument):
library(DBI)
con <- dbConnect(RSQLite::SQLite())
res <- dbSendQuery(con, "SELECT 1 AS a")
dbFetch(res)
#> a
#> 1 1
dbClearResult(res)
res <- dbSendQuery(con, "SELECT 1 AS a")
dbBind(res, params = list())
#> Error: Query does not require parameters.
dbFetch(res)
#> a
#> 1 1
dbClearResult(res)
res <- dbSendQuery(con, "SELECT ? AS a")
dbBind(res, params = list(1))
dbFetch(res)
#> a
#> 1 1
dbClearResult(res)
res <- dbSendQuery(con, "SELECT ? AS a")
dbFetch(res)
#> Error: Query needs to be bound before fetching
dbClearResult(res)
dbDisconnect(con)
Created on 2019-04-24 by the reprex package (v0.2.1.9000)
dbBind()
is required if the query has parameters, and also throws an error if the query doesn't require parameters.
Currently, dbSendQuery()
is required to raise an error if the query syntax is invalid: https://dbi.r-dbi.org/reference/dbsendquery#value. Would it help if that requirement is lifted?
Let me try to rephrase the problem, please correct me if I'm mistaken.
- The query and all parameters should be sent "in one packet" to Teradata.
- Currently, there's no way to tell from the
dbSendQuery()
call if the query will contain parameters. Analysis of the query text is not an option. - Some non-parametrized queries even require a different "direct" API. However, in principle, the "parametrized" API can be used for non-parametrized queries.
@krlmlr Yes, your points 1, 2, 3 are correct.
For the Teradata DBI Driver to provide this behavior from your example, dbSendQuery
must attempt to execute the SQL request. The driver would not know before execution whether the request contains parameter markers or not.
res <- dbSendQuery(con, "SELECT 1 AS a")
dbFetch(res)
#> a
#> 1 1
dbClearResult(res)
Per your point 3, the driver cannot blindly attempt to prepare all SQL requests, because the Teradata Database supports a prepare operation for most, but not every, kind of SQL request. The error returned by the Teradata Database for an unsupported prepare operation is the same as the generic syntax error.
The Teradata DBI Driver needs explicit unambiguous direction from the app as to whether the app wants to prepare versus the app wants to execute. Note that this problem doesn't arise in other language database APIs such the JDBC API which offers non-parameterized SQL functionality via Connection.createStatement
and Statement.execute
separately from parameterized SQL functionality via Connection.prepareStatement
and PreparedStatement.execute
.
Thanks.
It feels that lifting the requirement that dbSendQuery()
checks the query syntax would help, even without introducing an immediate
argument (see further below).
When to send the query to the database?
The Teradata DBI Driver should send the query to the database on one of the following events:
dbSendQuery()
is called with theparams
argument set- if query parameters are inconsistent, the database throws an error
params = list()
is a supported mode of operation
dbFetch()
is called on the result returned bydbSendQuery()
: the query is sent to the database, indicating that no parameters are given- if parameters are missing, the database checks and throws an error
dbBind()
is called: the query and all parameters are sent to the database- if query parameters are inconsistent, the database throws an error
The following example illustrates these modes of operation:
library(DBI)
# Database communication occurs after <<<<< lines
con <- dbConnect(RSQLite::SQLite())
## <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
res <- dbSendQuery(con, "SELECT ? AS a", params = list(1))
dbFetch(res)
#> a
#> 1 1
dbClearResult(res)
res <- dbSendQuery(con, "SELECT 1 AS a")
## <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
dbFetch(res)
#> a
#> 1 1
dbClearResult(res)
res <- dbSendQuery(con, "SELECT ? AS a")
## <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
dbBind(res, params = list(1))
dbFetch(res)
#> a
#> 1 1
dbClearResult(res)
Created on 2019-04-25 by the reprex package (v0.2.1.9000)
As per your requirement, no client-side checking occurs, and the query and all parameters are sent in at the same time. No need to introduce new arguments to dbSendQuery()
for now. The only behavior that is not here supported would be the "eager error checking" currently in place, I'm happy to do away with it:
library(DBI)
con <- dbConnect(RSQLite::SQLite())
dbSendQuery(con, "SELECTT ? AS a")
#> Error: near "SELECTT": syntax error
Created on 2019-04-25 by the reprex package (v0.2.1.9000)
It feels like dbSendQuery()
is a misnomer here, in the sense that it does not always sends the query to the database. We're bound to use that name and the behavior for compatibility reasons, but we should update documentation to reflect this detail.
Most users will/should use dbGetQuery()
and dbExecute()
anyway, most of these subtleties don't occur there.
"Prepared" vs. "direct" API
This would come in addition to the behavior described above. It would only hint at the choice of the API. In particular, dbSendQuery(immediate = TRUE)
could also send the query to the database right away, because no parameters need to be waited for.
It would help if you please could point me to the upstream API the Teradata DBI Driver is using, and perhaps to the DBI Driver's source code.
From my perspective, the duality of dbSendQuery
to either execute or prepare a query was always strange. I think the cleanest solution would be to add a dbPrepare
method, which could be called by a default dbSendQuery
implementation if parameters are given.
Unfortunately, that ship has sailed now -- I'd rather not add new methods to DBI at this time.
DBI 1.1.0 is on CRAN now. Thanks again for your input!
What we plan to do in DuckDB is always internally prepare statements, even if no parameters are given.
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.