googleapis/python-bigquery-dataframes

Support wildcard tables (and filter on `_TABLE_SUFFIX`) in `read_gbq` / `read_gbq_table`

tswast opened this issue · 2 comments

tswast commented

Is your feature request related to a problem? Please describe.

In https://cloud.google.com/bigquery/docs/create-machine-learning-model there is the following SQL:

SELECT
  IF(totals.transactions IS NULL, 0, 1) AS label,
  IFNULL(device.operatingSystem, "") AS os,
  device.isMobile AS is_mobile,
  IFNULL(geoNetwork.country, "") AS country,
  IFNULL(totals.pageviews, 0) AS pageviews
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'

I'd like to be able to represent all of this in Python without any SQL code. There are two problems right now:

  1. bigquery-public-data.google_analytics_sample.ga_sessions_* isn't supported as a table ID in read_gbq. This doesn't refer to any single table, so API requests based on the table ID will fail.

  2. Even if (1) were supported, it would try to copy all the data into a temp table. It would be best to be able to specify a filter on _TABLE_SUFFIX at data read time.

Describe the solution you'd like

bpd.read_gbq("bigquery-public-data.google_analytics_sample.ga_sessions_*") should work.

Also, somewhat inspired by the BigQuery Storage API, accept a row_restriction parameter to filter rows.

bpd.read_gbq(
    "bigquery-public-data.google_analytics_sample.ga_sessions_*",
    filters=[("_TABLE_SUFFIX", ">=", '20160801'), ("_TABLE_SUFFIX", "<=", '20170630')],
)

See: https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#google.cloud.bigquery.storage.v1.ReadSession.TableReadOptions.FIELDS.string.google.cloud.bigquery.storage.v1.ReadSession.TableReadOptions.row_restriction

See also: "filters" parameter in pandas.read_parquet

Describe alternatives you've considered

SQL as an input to read_gbq works as an alternative right now.

Additional context

Related feature request on pandas-gbq:

tswast commented

@Genesis929 This is affecting more BQML tutorials. I think we should try and find a solution for this sooner rather than later. I think it may require translation to SQL similar to your filters implementation.