googleapis/python-bigquery

bug: `query_and_wait` drops unknown properties in the QueryJobConfig

tswast opened this issue · 2 comments

tswast commented

In implementing pandas-gbq integration with the new, faster query_and_wait method, a test failure raised the issue that some request body details in a custom job configuration are being dropped. See: https://github.com/googleapis/python-bigquery-pandas/pull/722/files/7739f41989c5d0effbaf66070c6b94b6c3840506#r1459296431

Steps to reproduce

  1. Create a QueryJobConfig with unknown or invalid parameters via from_api_repr()
  2. Pass the job_config to query_and_wait().
  3. Observe that the extra parameters are not sent.

Code example

import google.cloud.bigquery
client = google.cloud.bigquery.Client()
project_id = "swast-scratch"

job_config = google.cloud.bigquery.QueryJobConfig.from_api_repr(
        {
            "copy": {
                "sourceTable": {
                    "projectId": project_id,
                    "datasetId": "publicdata:samples",
                    "tableId": "wikipedia",
                },
                "destinationTable": {
                    "projectId": project_id,
                    "datasetId": "publicdata:samples",
                    "tableId": "wikipedia_copied",
                },
            }
        })
client.query_and_wait("select 1", job_config=job_config)

Stack trace

No stack trace is produced, even though one should be due to an invalid configuration.

I also wonder to which extent clients should verify the validity of requests. I think we intentionally avoid it in many cases, in order to reduce redundancy and keep the client lightweight.

tswast commented

Yeah, we do indeed. I was surprised not to see a server-side error for this, so I limited my client-side validation to only the most common mistakes (wrong type of config object).