/rudder-airflow-provider

Rudderstack provider for Apache Airflow

Primary LanguagePythonMIT LicenseMIT

The Customer Data Platform for Developers

Website · Documentation · Slack Community


RudderStack Airflow Provider

The RudderStack Airflow Provider lets you programmatically schedule and trigger your Reverse ETL syncs from outside RudderStack and integrate them with your existing Airflow workflows.

For more information on using the Airflow Provider utility, refer to the documentation.

Installation

pip install rudderstack-airflow-provider

Usage

RudderstackOperator

Note

Use RudderstackRETLOperator for reverse ETL connections

A simple DAG for triggering syncs for a RudderStack source:

with DAG(
    'rudderstack-sample',
    default_args=default_args,
    description='A simple tutorial DAG',
    schedule_interval=timedelta(days=1),
    start_date=datetime(2021, 1, 1),
    catchup=False,
    tags=['rs']
) as dag:
    rs_operator = RudderstackOperator(
        source_id='<source-id>',
        task_id='<any-task-id>',
        connection_id='rudderstack_conn'
    )

For the complete code, refer to this example.

Operator Parameters

Parameter Description Type Default
source_id Valid RudderStack source ID String None
task_id A unique task ID within a DAG String None
wait_for_completion If True, the task will wait for sync to complete. Boolean False
connection_id The Airflow connection to use for connecting to the Rudderstack API. String rudderstack_default

The RudderStack operator also supports all the parameters supported by the Airflow base operator.

For details on how to run the DAG in Airflow, refer to the documentation.

RudderstackRETLOperator

Trigger syncs for RETL connections

with DAG('rudderstack-sample',
    default_args=default_args,
    description='A simple tutorial DAG',
    schedule_interval=timedelta(days=1),
    start_date=datetime(2021, 1, 1),
    catchup=False,
    tags=['rs']) as dag:
    rs_operator = RudderstackRETLOperator(
        retl_connection_id='2aiDQzMqP6LNuUokWstmaubcZOP',
        task_id='retl-test-sync',
        connection_id='rudder_yeshwanth_dev',
        sync_type='full',
        wait_for_completion=True
    )

Operator parameters

Parameter Description Type Default
retl_connection_id Valid RudderStack RETL connection ID String (templatable) None
task_id A unique task ID within a DAG String None
wait_for_completion If True, the task will wait for sync to complete. Boolean False
connection_id The Airflow connection to use for connecting to the Rudderstack API. String rudderstack_default
sync_type Type of sync to trigger incremental or full (templatable) incremental

For details on how to run the DAG in Airflow, refer to the documentation.

Contribute

We would love to see you contribute to this project. Get more information on how to contribute here.

License

The RudderStack Airflow Provider is released under the MIT License.

Contact Us

For more information or queries on this feature, you can contact us or start a conversation in our Slack community.