/RasgoTransforms

SQL-based transforms compatible with Rasgo and PyRasgo

Primary LanguageJupyter NotebookGNU Affero General Public License v3.0AGPL-3.0

PyPI version Docs Chat on Slack Chat on Discourse

Rasgo Transforms

Rasgo Transforms is a library of SQL templates that can be called like functions. They use jinja for templating and are compatible with multiple Rasgo tools:

  • RasgoQL, an open source python package to transform data in SQL
  • PyRasgo, a python library for interacting with Rasgo
  • Rasgo SQL Generator, a browser tool for generating SQL via a user interface
  • Rasgo, our enterprise self-service analytics product

Writing Transforms

Transforms are SQL templates that are rendered using arguments that we define in a config .yaml. file. To add a transform, you must include the jinja template file which produces the SQL, the config yaml file which defines the arguments needed for the template to render, and optionally, a python file with the infer_columns function implemented which is used to infer the columns their types of a table created from a transform's sql query.

The new transform should be added to the projects file structure like this:

rasgotransforms/rasgotransforms/transforms/
    <transform name>/
        <data warehouse type (optional)>/
            <transform name>.sql
            <transform name>.yaml
        <transform name>.sql
        <transform name>.yaml
        <transform name>.py

If the sql for a template needs to render differently for specific data warehouse types, then additional template and config definitions can be added under a data warehouse specific directory (e.g. 'snowflake')

Inferring Columns

RasgoTransforms has the ability to infer the column names and data types from a SQL query generated by one of the templates using the infer_columns function.

To enable column inference for a transform, the python file for the transform must contain a python function which takes the arguments and source columns and returns a dictionary mapping the column names to their types. The required structure is like this:

def infer_columns(args, source_columns) -> dict:
    out_columns = source_columns.copy()
    # Logic which determines columns output from the query...
    out_columns['new_column'] = 'new_column_type'
    return out_columns

About Us

Rasgo Transforms are maintained by Rasgo.