/soda-github-action

:zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.

Primary LanguagePythonApache License 2.0Apache-2.0

Soda GitHub Action

GitHub Super-Linter .github/workflows/tests.yaml

Soda enables Data Engineers to test data for quality where and when they need to. It works by taking the data quality checks that you prepare and using them to run a scan of datasets in a data source.

A scan is a CLI command which instructs Soda to prepare optimized SQL queries that execute data quality checks on your data source to find invalid, missing, or unexpected data. When checks fail, they surface bad-quality data and present check results that help you investigate and address quality issues.

Add the GitHub Action for Soda to your GitHub Workflow to automatically execute scans for data quality during development.

For example, in a repository in which you are adding a transformation or making changes to a dbt model, you can add the Soda GitHub Action to your workflow. With each new PR, or commit to an existing PR, it executes a Soda scan for data quality and presents the results of the scan in a comment in the pull request, and in a report in Soda Cloud.

Where the scan results indicate an issue with data quality, Soda notifies you both in the PR comment, and by email so that you can investigate and address any issues before merging the PR into production.

Refer to Soda documentation for an example use case.

Use the Soda GitHub Action

Add the action to your GitHub Workflow, as in the following example in the Perform Soda Scan step.

name: Scan for data quality

on: pull_request
jobs:
  soda_scan:
    runs-on: ubuntu-latest
    name: Run Soda Scan
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Perform Soda Scan
        uses: sodadata/soda-github-action@v1
        env:
          SODA_CLOUD_API_KEY: ${{ secrets.SODA_CLOUD_API_KEY }}
          SODA_CLOUD_API_SECRET: ${{ secrets.SODA_CLOUD_API_SECRET }}
          SNOWFLAKE_USERNAME: ${{ secrets.SNOWFLAKE_USERNAME }}
          SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
        with:
          soda_library_version: v1.0.4
          data_source: snowflake
          configuration: ./configuration.yaml
          checks: ./checks.yaml

Refer to testing files and the test workflow for more context for the example.

Action inputs

Name Description Required Default
soda_library_version Version of the Soda Library that runs the scan. Supply a specific version, such as v1.0.4, or latest.
See soda-library docker images for possible versions. Compatible with Soda Library 1.0.4 and higher.
-
data_source Name of data source on which to perform the scan. -
configuration File path to configuration YAML file. See Soda docs. -
checks File path to checks YAML file. See Soda docs. Compatible with shell filename extensions.
Identify multiple check files, if you wish. For example: ./checks_*.yaml or ./{check1.yaml,check2.yaml}
-

Self-hosted runners

  • Windows runners are not supported, including the use of official Windows-based images such as windows-latest.
  • MacOS runners require installation of Docker because macos-latest does not come with Docker pre-installed.

Access Soda documentation for more information.