/queries

Store parameterized queries

Primary LanguageROtherNOASSERTION

queries

I have a number of parameterized SQL queries that I run regularly. The purpose of this package is to keep canonical versions of those queries with standardized parameterization. It uses the glue package to replace parameters.

Usage

The package reads modified sql files that have metadata stored in yaml format in a comment block at the top. Here’s a simple example: start with a file like [tests/testthat/example_with_defaults.sql](tests/testthat/example_with_defaults.sql).

cat(readr::read_file("tests/testthat/example_with_defaults.sql"))
#> -- name: Sales by group
#> -- description: Computes metrics grouped by dimensions. If the description is
#> --  really long you can just continue on the next line with a single-space
#> --  indent.
#> -- params:
#> --   - name: dimensions
#> --     description: dimensions to group by
#> --     default: country
#> --   - name: metrics
#> --     description: metrics to aggregate by dimension
#> 
#> SELECT {comma_join(metrics, trailing_comma = TRUE)} {comma_join(dimensions)}
#> FROM Customers
#> GROUP BY {comma_join(dimensions)}

The header is written in yaml format. I can import this into R with

library(queries)
query <- query_load("tests/testthat/example_with_defaults.sql")

and then plug in parameters with

query_substitute(query, metrics = c(total_sales = "SUM(Sales)", avg_sales = "AVG(Sales)")) %>% 
  cat
#> SELECT SUM(Sales) as total_sales, AVG(Sales) as avg_sales, country
#> FROM Customers
#> GROUP BY country

Notice that vector names are converted to column identifiers.

Defaults are applied as specified in the yaml header. You can also create a function that completes the query:

qf <- query_as_function(query)
qf(dimensions = c("Country", "Segment", "Product"), metrics = c(Sales = "SUM(Sales)"))
#> SELECT SUM(Sales) as Sales, Country, Segment, Product
#> FROM Customers
#> GROUP BY Country, Segment, Product

The created function has arguments corresponding to the params as specified in the yaml header.

print(args(qf))
#> function (dimensions = "country", metrics = NULL) 
#> NULL

RStudio will see these and autocomplete them, which is convenient. When I forget a parameter, it tells me.

qf(dimensions = "Segment")
#> Error in qf(dimensions = "Segment"): Missing params with no default:
#> - metrics

Big picture usage

Query Library

I use this in two ways. First, I have a library of queries stored in a repository—say ~/projects/path/to/queries. These are canonical versions of parameterized queries that my team uses. In my .Rprofile, I have a line that sets the default query location to that path

options(default_queries_location='~/projects/path/to/queries')

Which contains files

  • sales_by_group.sql
  • customer_list.sql
  • etc…

Whenever I need to run one of these queries, I build the query with

q <- load_query('sales_by_group')

If I don’t remember all the parameters, I can view the header with head:

head(query)
#> name: Sales by group
#> description: Computes metrics grouped by dimensions. If the description is
#>  really long you can just continue on the next line with a single-space
#>  indent.
#> params:
#>   - name: dimensions
#>     description: dimensions to group by
#>     default: country
#>   - name: metrics
#>     description: metrics to aggregate by dimension

R Projects

The other way that I use this is to store bespoke queries in a /sql directory in an R project. load_query will look for a /sql directory first, then check whether the default_query_location option is set up.

Installation

Install with devtools:

devtools::install_github("colin-fraser/queries")