/airbyte-local-cli

CLI for running Airbyte sources & destinations locally without Airbyte server

Primary LanguageShellApache License 2.0Apache-2.0

Airbyte Local CLI CI

CLI for running Airbyte sources & destinations locally or on a Kubernetes cluster without an Airbyte server

Alt Text

Example Usage

Requirements: bash, jq, tee. Additionally, docker when running syncs locally, or kubectl when running on a Kubernetes cluster.

Either download the script manually or invoke the script directly with curl:

bash <(curl -s https://raw.githubusercontent.com/faros-ai/airbyte-local-cli/main/airbyte-local.sh) --help

For example here is how you can sync ServiceNow source with Faros Cloud destination:

./airbyte-local.sh \
  --src 'farosai/airbyte-servicenow-source' \
  --src.username '<source_username>' \
  --src.password '<source_password>' \
  --src.url '<source_url>' \
  --dst 'farosai/airbyte-faros-destination' \
  --dst.edition_configs.edition 'cloud' \
  --dst.edition_configs.api_url '<faros_api_url>' \
  --dst.edition_configs.api_key '<faros_api_key>' \
  --dst.edition_configs.graph 'default' \
  --state state.json \
  --check-connection

Or with Faros Community Edition as the destination:

./airbyte-local.sh \
  --src 'farosai/airbyte-servicenow-source' \
  --src.username '<source_username>' \
  --src.password '<source_password>' \
  --src.url '<source_url>' \
  --dst 'farosai/airbyte-faros-destination' \
  --dst.edition_configs.edition 'community' \
  --dst.edition_configs.hasura_admin_secret 'admin' \
  --dst.edition_configs.hasura_url 'http://host.docker.internal:8080/' \
  --state state.json \
  --check-connection

Note: The src.* and dst.* arguments will differ depending on the source and destination being used.

Or on a Kubernetes cluster:

./airbyte-local.sh \
  --src 'farosai/airbyte-servicenow-source' \
  --src.username '<source_username>' \
  --src.password '<source_password>' \
  --src.url '<source_url>' \
  --dst 'farosai/airbyte-faros-destination' \
  --dst.edition_configs.edition 'cloud' \
  --dst.edition_configs.api_url '<faros_api_url>' \
  --dst.edition_configs.api_key '<faros_api_key>' \
  --dst.edition_configs.graph 'default' \
  --state state.json \
  --k8s-deployment \
  --k8s-namespace default \
  --max-cpus 0.5 \
  --max-mem 500Mi \
  --keep-containers

Note: The command assumes Kubernetes cluster context, and credentials are already configured. For more info, see official docs.

Configuring Faros source/destination using a wizard

Note: Faros Sources and/or Faros Destination only. Not supported with Kubernetes deployment.

Instead of passing src.* and dst.*, it is possible to invoke a configuration wizard for the Faros source and/or destination:

./airbyte-local.sh \
  --src 'farosai/airbyte-servicenow-source' \
  --src-wizard \
  --dst 'farosai/airbyte-faros-destination' \
  --dst-wizard

Arguments

Argument Required Description
--src <image> Yes Airbyte source Docker image
--dst <image> Yes Airbyte destination Docker image
--src.<key> <value> Append "key": "value" into the source config *
--dst.<key> <value> Append "key": "value" into the destination config *
--check-connection Validate the Airbyte source connection
--full-refresh Force source full_refresh and destination overwrite mode
--state <path> Override state file path for incremental sync
--src-output-file <path> Write source output as a file (handy for debugging)
--src-catalog-overrides <json> JSON string of sync mode overrides. See overriding default catalog
--src-config-file <path> Source config file path
--src-config-json <json> Source config as a JSON string
--src-catalog-file <path> Source catalog file path
--src-catalog-json <json> Source catalog as a JSON string
--dst-config-file <path> Destination config file path
--dst-config-json <json> Destination config as a JSON string
--dst-catalog-file <path> Destination catalog file path
--dst-catalog-json <json> Destination catalog as a JSON string
--dst-stream-prefix <prefix> Destination stream prefix
--no-src-pull Skip pulling Airbyte source image
--no-dst-pull Skip pulling Airbyte destination image
--src-wizard Run the Airbyte source configuration wizard
--dst-wizard Run the Airbyte destination configuration wizard
--src-only Only run the Airbyte source
--dst-only <file> Use a file for destination input instead of a source
--connection-name Connection name used in various places
--raw-messages Output raw Airbyte messages, i.e., without a log prefix or colors
--max-log-size <size> Set Docker maximum log size
--max-mem <mem> Set the maximum amount of memory for Docker or Kubernetes container, e.g., "1g" or "1024Mi"
--max-cpus <cpus> Set the maximum number of CPUs for each Docker or Kubernetes container, e.g, "1" or "1000m"
--src-docker-options "<string>" Set additional options to pass to the docker run <src> command, e.g --src-docker-options "-e NODE_OPTIONS=--max_old_space_size=2000 -e NODE_TLS_REJECT_UNAUTHORIZED=0"
--dst-docker-options "<string>" Set additional options to pass to the docker run <dst> command, e.g --dst-docker-options "-e NODE_OPTIONS=--max_old_space_size=2000"
--k8s-deployment Deploy and run source/destination connectors as a pod on a Kubernetes cluster
--k8s-namespace <name> Kubernetes namespace where the source/destination connectors pod is deployed to
--keep-containers Do not delete source and destination containers (or Kubernetes pod) after they exit
--debug Enable debug logging

Note: when passing an array value for a parameter specify it as a json array, for example:

--src.projects '["project-1","project-2","project-3"]'

Overriding Default Catalog

To generate the Airbyte catalog needed for running the source and destination connectors, the script runs the discover command on the source to get the list of all supported streams. It then creates an Airbyte configured catalog, enabling all of the streams and using "incremental" sync mode for all the streams that support it. Each stream's destination sync mode defaults to "append" for incremental streams and "overwrite" for full_refresh streams. To disable or customize the sync mode or destination sync mode on any of the streams, pass a --src-catalog-overrides option whose value is a JSON string in the following format:

{
  "<stream name 1>": { "disabled": true },
  "<stream name 2>": {
    "sync_mode": "full_refresh",
    "destination_sync_mode": "append"
  }
}

You can also force full_refresh mode for all streams by setting the --full-refresh flag.