/ABSQL

Render SQL queries using file frontmatter

Primary LanguagePythonMIT LicenseMIT

logo

PyPi build codecov Code Style


ABSQL is a format for templated SQL, and the ABSQL package is a rendering engine for this format.

ABSQL allows users to inject limitless context - text, objects, and functions - into their SQL templates, both within the rendering engine (the Runner) and in the SQL file itself.

The flexibility and extendibility of ABSQL affords SQL engineers a consistent interface for generating templated SQL, while simultaneously giving downstream orchestration solutions (e.g. Airflow) the opportunity to develop orchestrator-specific implementations for ABSQL.

ABSQL leverages the familiar Jinja2 templating engine, as well as YAML-style frontmatter to provide users with a unique and intuitive SQL authoring experience.

Getting Started

Rendering Text

The ABSQL Runner is robust, but we will start simple. The static render_text method will render user-provided text with any provided context.

from absql import Runner

Runner.render_text('{{greeting}}, World!', greeting="Hello")
# 'Hello, World!'

Reusing Context

We might want to reuse the context we provide. In this case, we can instantiate a runner and provide it with context. We can then render any text with this saved context. An instantiated runner has a render method, that works with text and - as we will see later - file paths, as well.

from absql import Runner

r = Runner(greeting = "Hello")
r.render("{{greeting}}, World!")
# 'Hello, World!'

Rendering Files (and Frontmatter Context)

The runner is not bound to plain text, and the context provided by the user is not confined to just the runner. Context can also be passed in through a frontmatter block at the top of a .sql file. This frontmatter block is parsed as YAML.

In the example file below, my_file.sql, the table_name context is provided in the frontmatter block, which starts and ends with three hyphens (---). The body of the file contains the templated SQL we want to render.

---
table_name: my_table
---

SELECT *
FROM {{table_name}}
WHERE greeting = '{{greeting}}'

Now, using the same runner as above, we can render the SQL, with our table_name context being provided from the file itself, and the greeting context being fed in from the runner.

r.render("my_file.sql")

# SELECT *
# FROM my_table
# WHERE greeting = 'Hello'

Providing an Engine

There are cases when templated queries require data from a database. In ABSQL, this can be accomplished by providing a SQLAlchemy engine to a runner, and using some of the default ABSQL-provided functions. While we could instantiate a new runner with this engine, we will use our existing runner's set_context method to add a SQLite engine. It is important that we set the engine as engine in the context.

from sqlalchemy import create_engine

engine = create_engine("sqlite:///:memory:")

r.set_context(engine=engine)

Now, we can update our query in my_file.sql to use ABSQL's builtin table_exists function. We will arbitrarily add a limit to our query, based on if a table exists.

---
table_name: my_table
---

SELECT *
FROM {{table_name}}
WHERE greeting = '{{greeting}}'
{% if table_exists("a_nonexistent_table") %}
LIMIT 10
{% else %}
LIMIT 1
{% endif %}

Then, we can render again.

r.render("my_file.sql")

# SELECT *
# FROM my_table
# WHERE greeting = 'Hello'
# LIMIT 1

See below for a full list of builtin functions.

Functions and Context

Default Functions

The following functions are available in any instantiated runner:

Category Function Description Example
database get_max_value Get the maximum value of a column. get_max_value("my_schema.my_table.my_column")
database get_min_value Get the minimum value of a column. get_min_value("my_schema.my_table.my_column")
database table_exists TRUE/FALSE if the table exists in the database. table_exists("my_schema.my_table")
database query_db Send any query to the database and get back the results. query_db("SELECT name FROM my_schema.my_table")
env env_switch Switch a value based on the given environment (an environment variable of "ENV" by default). env_switch(dev=10, prod=500, default=30)
env env_var Retrieve a value from an environment variable. env_var("MY_ENVIRONMENT_VARIABLE")
time datetime Python's datetime.datetime function. datetime(2022, 1, 1)
time previous_date Get a date string for the previous date (based on UTC by default). previous_date()
time previous_hour Get a datetime string the previous hour (based on UTC by default). previous_hour()
time timedelta Python's datetime.timedelta function. timedelta(hours=1)

For the database functions, it is recommended that you pass in an engine to the runner ahead of rendering (e.g. runner = Runner(engine=engine)). If no engine is provided, the runner will search for a URI in the environment variable AB__URI.

Also note that these functions are accessible in both the frontmatter and body of a file. Below is an example illustrating this.

---
schema: "{{ env_switch(prod='prod_schema', dev=env_var('DEV_SCHEMA')) }}"
---

SELECT *
FROM {{schema}}.my_table

Adding Functions and Context

As mentioned elsewhere, users can add functions and context:

  • In a file's frontmatter
  • When instantiating a new runner
  • By using the set_context method on an existing runner

Note that functions which contain a keyword argument engine will automatically be passed the engine object attached to the runner by default. Users can set desired keyword arguments to automatically be added to functions by passing a list of keyword argument names to the partial_kwargs argument, available in any of the render_[x] functions, as well as when instantiating a Runner object.

Environment Configuration

While no configuration is required to get started with ABSQL, there are two environment variables you can set:

  • AB__URI - The URI for the database that should be used by ABSQL's default database functions. As mentioned, it is advisable that you just pass in your own SQLAlchemy engine object to a runner under the context name engine, if you are going to use the database functions.
  • AB__ENV - The name of the environment variable that designates what the environment is. By default, ABSQL assumes this environment variable is called "ENV".

Development

The below assumes you have Docker installed.

Starting for the First Time

First, git clone this repository.

Then:

export ABSQL_HOME="~/path/to/this/project"
cd $ABSQL_HOME
make build-image
make run-image

The above will build the development image under the name absql-testing and run a container called absql-testing.

From here, you can:

  • make exec - Exec into a terminal in the running container.
  • make test - Runs pytest in a temporary container.
  • make coverage - Runs pytest and generates a coverage report.
  • make browse-coverage - Opens up the coverage report in your browser.
  • make stop-container - Stop the running container.
  • make start-container - Start a stopped container.

Rebuilding the Image

make stop-container # if you have a running container
make remove-container # if you have a stopped container
make build-image
make run-image