Querying views returns no rows
samterfa opened this issue · 5 comments
Thank you for the super helpful package! I have found that I can query and return rows from a BigQuery table fine with this package but I'm unable to return rows from a BigQuery view. I just get an empty tibble back using the cran version and the github version of this package. I have verified that I can query the view in the console using the code generated by dplyr::show_query()
. The reprex below was adapted from an issue filed in another repo which was never submitted here that I could find (I've redacted the project id). Am I missing something?
Thank you!
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(bigrquery)
bq_auth(path = glue::glue('~/.creds/bigquery.json'))
project_id <-
bq_test_project()
conn <-
bigrquery::dbConnect(
bigrquery::bigquery(),
project = project_id,
billing = project_id
)
dataset <- glue::glue('{project_id}.test_dataset')
if (bq_dataset_exists(dataset))
{
bq_dataset_delete(dataset, delete_contents = TRUE)
}
bq_dataset_create(dataset)
#> <bq_dataset> PROJECT_ID.test_dataset
# Create mtcars table
mtcars_table <- glue::glue('{dataset}.mtcars')
if (dbExistsTable(conn, mtcars_table))
{
dbRemoveTable(conn, mtcars_table)
}
bq_table_create(mtcars_table, mtcars)
#> <bq_table> PROJECT_ID.test_dataset.mtcars
bq_table_upload(mtcars_table, mtcars)
# Verify we can query mtcars table.
tbl(conn, mtcars_table) |> collect() |> head(5)
#> Warning: <BigQueryConnection> uses an old dbplyr interface
#> ℹ Please install a newer version of the package or contact the maintainer
#> This warning is displayed once every 8 hours.
#> # A tibble: 5 × 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 2 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 3 21.5 4 120. 97 3.7 2.46 20.0 1 0 3 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
# Create mtcars view based on mtcars table
mtcars_view <- glue::glue("{dataset}.mtcars_view")
if (dbExistsTable(conn, mtcars_view))
{
dbRemoveTable(conn, mtcars_view)
}
bq_dataset_query(x = dataset, query = glue::glue('CREATE VIEW mtcars_view AS SELECT mpg, cyl, disp FROM `{dataset}.mtcars`'))
#> <bq_table> PROJECT_ID.test_dataset.mtcars_view
# mtcars table and view exist
bq_dataset_tables(x = dataset)
#> [[1]]
#> <bq_table> PROJECT_ID.test_dataset.mtcars
#>
#> [[2]]
#> <bq_table> PROJECT_ID.test_dataset.mtcars_view
# Query mtcars view. Empty result.
tbl(conn, mtcars_view) |> collect() |> head(5)
#> # A tibble: 0 × 3
#> # … with 3 variables: mpg <dbl>, cyl <dbl>, disp <dbl>
This happened to me too. Using dbGetQuery works, but not the dplyr wrapper.
Somewhat more minimal reprex:
library(dplyr, warn.conflicts = FALSE)
library(bigrquery)
options(gargle_oauth_email = TRUE)
ds <- bq_test_dataset()
#> ℹ The bigrquery package is using a cached token for 'hadley@posit.co'.
#> Auto-refreshing stale OAuth token.
con <- DBI::dbConnect(ds)
DBI::dbWriteTable(con, "mtcars", mtcars)
sql <- glue::glue('CREATE VIEW mtcars2 AS SELECT mpg, cyl, disp FROM {ds$dataset}.mtcars')
DBI::dbExecute(con, sql)
#> [1] 0
tbl(con, "mtcars2")
#> # Source: table<mtcars2> [0 x 3]
#> # Database: BigQueryConnection
#> # ℹ 3 variables: mpg <dbl>, cyl <int64>, disp <dbl>
Created on 2023-11-07 with reprex v2.0.2.9000
Hmmmm, this happens because the metadata for a view says it has 0 rows.
Ah, the problem is that dplyr thinks that this is a table, so falls through to a faster way of downloading the data (which clearly doesn't work for views). I think the best we can do here is on tbl
creation look to see if the table is a view, and if so, record that information in the created object.