/tdk-on-gcp

Public repository for the configuration files, user guide, and other resources to run the Synthesized TDK on GCP

Primary LanguageDockerfileApache License 2.0Apache-2.0

Overview

Public repository for the configuration files, user guide, and other resources to run the Synthesized TDK on GCP

What is Synthesized TDK?

Available on the GCP Cloud Marketplace: https://console.cloud.google.com/marketplace/product/synthesized-marketplace-public/synthesized-tdk

Installation

After installation is done, the TDK CronJob should be created.

Prerequisites

Databases

Please check the list of supported database dialects: https://docs.synthesized.io/tdk/latest/#_whats_included.

It is assumed that there are two databases: input and output. You must know the JDBC URL, username and password for each.

Synthesized config

The Synthesized YAML configuration should be provided.

This can be:

For example, the following config can be used:

default_config:
  mode: MASKING
safety_mode: RELAXED

Quick install with Google Cloud Marketplace

To install Synthesized TDK to a Google Kubernetes Engine cluster via Google Cloud Marketplace, follow the on-screen instructions.

Command-line instructions

NOTE: The CLI installation is only available if you have successfully deployed TDK from the marketplace and reporting service key was generated.

Prerequisites

Setting up command-line tools

You need the following tools in your development environment:

Configure gcloud as a Docker credential helper:

gcloud auth configure-docker

Creating a Google Kubernetes Engine (GKE) cluster

Create a new cluster from the command line. You can change values of the properties CLUSTER and ZONE.

export CLUSTER=tdk-cluster
export ZONE=us-west1-a

gcloud container clusters create "${CLUSTER}" --zone "${ZONE}"

Configure kubectl to connect to the new cluster:

gcloud container clusters get-credentials "${CLUSTER}" --zone "${ZONE}"

Cloning this repo

Clone this repo, as well as its associated tools repo:

git clone --recursive https://github.com/synthesized-io/tdk-on-gcp.git

Installing the Application resource definition

An Application resource is a collection of individual Kubernetes components, such as Services, Deployments, and so on, that you can manage as a group.

To set up your cluster to understand Application resources, run the following command:

kubectl apply -f "https://raw.githubusercontent.com/GoogleCloudPlatform/marketplace-k8s-app-tools/master/crd/app-crd.yaml"

You need to run this command once.

The Application resource is defined by the Kubernetes SIG-apps community. You can find the source code at github.com/kubernetes-sigs/application.

Installing the app

Configuring the app with environment variables

Choose an instance name and namespace for the app.

export APP_INSTANCE_NAME=synthesized-tdk-cli
export NAMESPACE=synthesized-tdk-cli

Set up the image tag. Example:

export TAG="1.32.0"

Configure the container images:

export IMAGE_REGISTRY="gcr.io/synthesized-marketplace-public/synthesized-tdk-cli"

(Optional) Set computation resources limit:

export RESOURCES_LIMITS_CPU=1
export RESOURCES_LIMITS_MEMORY=1Gi

If you use a different namespace than the default, create a new namespace by running the following command:

kubectl create namespace "${NAMESPACE}"

Configuring database URLs and credentials

Please replace the values and run:

export SYNTHESIZED_INPUT_URL={YOUR_INPUT_JDBC_URL}
export SYNTHESIZED_INPUT_USERNAME={YOUR_INPUT_USERNAME}
export SYNTHESIZED_INPUT_PASSWORD={YOUR_INPUT_PASSWORD}
export SYNTHESIZED_OUTPUT_URL={YOUR_OUTPUT_JDBC_URL}
export SYNTHESIZED_OUTPUT_USERNAME={YOUR_OUTPUT_USERNAME}
export SYNTHESIZED_OUTPUT_PASSWORD={YOUR_OUTPUT_PASSWORD}

Creating Synthesized transformation configuration

Create configuration file:

touch synthesized_config.yaml

Fill the synthesized_config.yaml with MASKING, GENERATION or KEEP config, e.g:

default_config:
  mode: MASKING
safety_mode: RELAXED

Configure schedule

(Optional) Set schedule when calling TDK. You can use the following value to disable scheduled startup.

export SCHEDULE="* * 31 2 *"

Expanding the manifest template

Use helm template to expand the template. We recommend that you save the expanded manifest file for future updates to your app.

helm template helm/synthesized-tdk-cli \
  --name-template "${APP_INSTANCE_NAME}" \
  --namespace "${NAMESPACE}" \
  --set image.repository="${IMAGE_REGISTRY}" \
  --set image.tag="${TAG}" \
  --set schedule="${SCHEDULE}" \
  --set resources.limits.cpu="${RESOURCES_LIMITS_CPU}" \
  --set resources.limits.memory="${RESOURCES_LIMITS_MEMORY}" \
  --set-file env.SYNTHESIZED_USERCONFIG="synthesized_config.yaml" \
  --set envRenderSecret.SYNTHESIZED_INPUT_URL="${SYNTHESIZED_INPUT_URL}" \
  --set envRenderSecret.SYNTHESIZED_INPUT_USERNAME="${SYNTHESIZED_INPUT_USERNAME}" \
  --set envRenderSecret.SYNTHESIZED_INPUT_PASSWORD="${SYNTHESIZED_INPUT_PASSWORD}" \
  --set envRenderSecret.SYNTHESIZED_OUTPUT_URL="${SYNTHESIZED_OUTPUT_URL}" \
  --set envRenderSecret.SYNTHESIZED_OUTPUT_USERNAME="${SYNTHESIZED_OUTPUT_USERNAME}" \
  --set envRenderSecret.SYNTHESIZED_OUTPUT_PASSWORD="${SYNTHESIZED_OUTPUT_PASSWORD}" \
  --set reportingSecret="${APP_INSTANCE_NAME}-reporting-secret" \
  > "${APP_INSTANCE_NAME}_manifest.yaml"

Applying the manifest to your Kubernetes cluster

To apply the manifest to your Kubernetes cluster, use kubectl:

kubectl apply -f "${APP_INSTANCE_NAME}_manifest.yaml" --namespace "${NAMESPACE}"

Viewing your app in the Google Cloud Console

To get the Cloud Console URL for your app, run the following command:

echo "https://console.cloud.google.com/kubernetes/application/${ZONE}/${CLUSTER}/${NAMESPACE}/${APP_INSTANCE_NAME}"

To view the app, open the URL in your browser.

Using the app

How to use TDK

After Synthesized TDK is Installed you can either run the job manually or wait when cronjob is triggered by schedule.

To trigger the cronjob manually run:

kubectl create job --from=cronjob/${APP_INSTANCE_NAME}-cron ${APP_INSTANCE_NAME} -n ${NAMESPACE}

To see logs for the job run (use "JOB NAME" from the previous step):

kubectl logs -f jobs/{JOB NAME} -n ${NAMESPACE}

App metrics

At the moment, the application does not support exporting Prometheus metrics and does not have any exporter.

Uninstalling the app

Using the Google Cloud Console

  1. In the Cloud Console, open Kubernetes Applications.

  2. From the list of apps, choose your app installation.

  3. On the Application Details page, click Delete.

Using the command-line

Preparing your environment

Set your installation name and Kubernetes namespace:

export APP_INSTANCE_NAME=synthesized-tdk-cli
export NAMESPACE=synthesized-tdk-cli

Deleting your resources

NOTE: We recommend using a kubectl version that is the same as the version of your cluster. Using the same version for kubectl and the cluster helps to avoid unforeseen issues.

Deleting the deployment with the generated manifest file

Run kubectl on the expanded manifest file:

kubectl delete -f ${APP_INSTANCE_NAME}_manifest.yaml --namespace ${NAMESPACE}

Deleting the deployment by deleting the Application resource

If you don't have the expanded manifest file, delete the resources by using types and a label:

kubectl delete application,secret,cronjob,job \
  --namespace ${NAMESPACE} \
  --selector name=${APP_INSTANCE_NAME}