Rasgo Transforms is a library of SQL templates that can be called like functions. They use jinja for templating and are compatible with multiple Rasgo tools:
- RasgoQL, an open source python package to transform data in SQL
- PyRasgo, a python library for interacting with Rasgo
- Rasgo SQL Generator, a browser tool for generating SQL via a user interface
- Rasgo, our enterprise self-service analytics product
Transforms are SQL templates that are rendered using arguments that we define in a config .yaml.
file. To add a transform, you must include the jinja template file which produces the SQL, the config yaml file which defines the arguments needed for the template to render, and optionally, a python file with the infer_columns
function implemented which is used to infer the columns their types of a table created from a transform's sql query.
The new transform should be added to the projects file structure like this:
rasgotransforms/rasgotransforms/transforms/
<transform name>/
<data warehouse type (optional)>/
<transform name>.sql
<transform name>.yaml
<transform name>.sql
<transform name>.yaml
<transform name>.py
If the sql for a template needs to render differently for specific data warehouse types, then additional template and config definitions can be added under a data warehouse specific directory (e.g. 'snowflake')
RasgoTransforms has the ability to infer the column names and data types from a SQL query generated by one of the templates using the infer_columns
function.
To enable column inference for a transform, the python file for the transform must contain a python function which takes the arguments and source columns and returns a dictionary mapping the column names to their types. The required structure is like this:
def infer_columns(args, source_columns) -> dict:
out_columns = source_columns.copy()
# Logic which determines columns output from the query...
out_columns['new_column'] = 'new_column_type'
return out_columns
Rasgo Transforms are maintained by Rasgo.