/spark-utils

Utility functions for dbt projects running on Spark

Primary LanguageMakefileApache License 2.0Apache-2.0

This dbt package contains macros that:

  • can be (re)used across dbt projects running on Spark
  • define Spark-specific implementations of dispatched macros from other packages

Installation Instructions

Check dbt Hub for the latest installation instructions, or read the docs for more information on installing packages.


Compatibility

This package provides "shims" for:

  • dbt_utils, except for:
    • dbt_utils.get_relations_by_prefix_sql
    • dbt_utils.get_tables_by_pattern_sql
    • dbt_utils.get_tables_by_prefix
    • dbt_utils.get_tables_by_pattern
  • snowplow (tested on Databricks only)

In order to use these "shims," you should set a dispatch config in your root project (on dbt v0.20.0 and newer). For example, with this project setting, dbt will first search for macro implementations inside the spark_utils package when resolving macros from the dbt_utils namespace:

dispatch:
  - macro_namespace: dbt_utils
    search_order: ['spark_utils', 'dbt_utils']

Note to maintainers of other packages

The spark-utils package may be able to provide compatibility for your package, especially if your package leverages dbt-utils macros for cross-database compatibility. This package does not need to be specified as a depedency of your package in packages.yml. Instead, you should encourage anyone using your package on Apache Spark / Databricks to:

  • Install spark_utils alongside your package
  • Add a dispatch config in their root project, like the one above

Contributing

We welcome contributions to this repo! To contribute a new feature or a fix, please open a Pull Request with 1) your changes and 2) updated documentation for the README.md file.


Getting started with dbt + Spark

Code of Conduct

Everyone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the PyPA Code of Conduct.