GoogleCloudDataproc/spark-bigquery-connector

Best practice to deal with query parameters?

kohsuke opened this issue · 1 comments

I'm trying to run a parameterized query, ala https://cloud.google.com/bigquery/docs/parameterized-queries#java

Dataset<Row> ds = spark.read().format("bigquery").option("query", "SELECT ... WHERE x > @cutoff").load()

I could be mistaken, but I couldn't find any support for passing in parameters. I want to avoid string manipulation for better protection against SQL injection. What's my best way forward?

I was hoping QueryJobConfiguration would provide a method that inlines parameters safely into query string, but from the source code, it looks like that is not done in the client side at all (which makes sense).

I'm using Spark 3.3. It looks like Spark 3.4 added the parameterized SQL query support so maybe that is the path forward, but for the time being I'm stuck with Spark 3.3.

I'd like to see this question addressed in README or implicitly in examples.

Please, please tell me there's a better way than String.format to deal with parameterized queries in 2023...

When reading from query, like demonstrated in the issue, you create a BigQuery SQL which is executed as is on the BigQuery query engine. Currently we do not support parameterized queries. It is a feature we may add in the future.