/xfmr

DevNexus 2018: Continuous Delivery for Data Pipelines

Primary LanguageShell

Continuous Delivery for Data Pipelines

The repo includes the demonstration of CI/CD automation for data pipelines orchestrated by Spring Cloud Data Flow. The demo assumes at a minimum 1.3 GA release is in use. The current latest version is at v1.7.4.

Spring Cloud Stream Processor

The xfmr (a poor-man’s name for "transformer") processor consumes an incoming payload, transforms it (adds a string prefix), and sends the processed payload to an output channel for downstream consumption.

You can build and run the tests via:

mvn clean install

Custom App Registration

Once the application is ready and uploaded to a remote repository (e.g., maven-artifactory or docker-hub), you can register the application in SCDF.

dataflow:>app register --name xfmr --type processor --uri maven://com.example:xfmr:0.0.3.BUILD-SNAPSHOT

Along with the custom xfmr processor, this demo also uses the out-of-the-box http source and log sink applications.

Spring Cloud Skipper

As a companion-server to SCDF, Skipper manages the granular application-lifecycle behind the scenes. You can read more of Skipper from the three-minute-tour. Skipper is configured with 2 platform backends: Cloud Foundry (on PWS) and Kubernetes (on GKE). Here’s a sample output of the platform-list that’s coming from Skipper, and it is now available in SCDF for use.

dataflow:>stream platform-list
╔════════╤════════════╤═════════════════════════════════════════════════════════════════════════════════════════╗
║  Name  │    Type    │                                       Description                                       ║
╠════════╪════════════╪═════════════════════════════════════════════════════════════════════════════════════════╣
║k8s-prod│kubernetes  │master url = [https://kubernetes.default.svc/], namespace = [default], api version = [v1]║
║cf-prod │cloudfoundry│org = [scdf-ci], space = [space-sabby], url = [https://api.run.pivotal.io]               ║
╚════════╧════════════╧═════════════════════════════════════════════════════════════════════════════════════════╝

Spring Cloud Data Flow

With the out-of-the-box apps and the xfmr processor registered in SCDF, a streaming-pipeline can be defined and deployed to the available platforms. In this setup, it could be either Cloud Foundry, Kubernetes, or both. For example, the following deploys the fooxfmr stream to Cloud Foundry (via cf-prod platform account); for which the credentials are pre-configured in Skipper. You can read more about the configurations in the reference guide.

dataflow:>stream create fooxfmr --definition "http | xfmr | log"

dataflow:>stream deploy --name fooxfmr --platformName cf-prod
Streaming Pipeline

Concourse

Concourse is the CI system in this demo. Any CI tooling that can support configuration-as-code can perform what we are attempting in this demo, too. In this case, a Concourse pipeline monitors the git-commits to kick-off a build, test, package, and the registeration of xfmr application in SCDF via the SCDF’s RESTful endpoints. Finally, also, the CI pipeline invokes the "stream-update" RESTful-endpoint (in SCDF) to continuously deploy the incremental changes to targeted platforms. For more details, review the ci folder, which includes the CI-pipeline as code and the associated concourse-job configurations.

Note
This CI pipeline assumes a fooxfmr and barxfmr streams are running in Cloud Foundry and Kubernetes respectively. The goal of this pipeline is to demonstrate how a change to business logic via git-commit would automatically trigger the build, run the tests, and finally register and rolling-upgrade the newly adjusted business-requirement change against the "live" stream processing data pipeline.

You can run the pipeline with the following command.

fly -t tutorial sp -p xfmr -c ci/pipeline.yml -l ci/credentials.yml

Once set up, we can access the xfmr pipeline from Concourse.

CI Pipeline
Note
credentials.yml is intentionally ignored from the repo. You need to create a file with credentials to set up the CI pipeline.

Test Data

To simulate incoming data, data.sh script generates random security numbers of format xxx-xx-xxxx. The script invokes the http-source application URL/route in the platform, with the generated security number as the payload in the message envelope.

Demo Recording

A live demonstration of this was presented at the Spring webinar. The demo starts at ~41.25.