/sdc-mapr-k8s

This project builds a custom Docker image of StreamSets Data Collector (SDC) with MapR v6.1 client and Kubernetes deployment support.

Primary LanguageShell

sdc-mapr-k8s

This project builds a custom Docker image of StreamSets Data Collector (SDC) with MapR v6.1 client and Kubernetes deployment support.

Configure and Build the Image

Switch to the sdc-mapr-docker directory.

Edit build.sh and make these changes:

  • Set the image name

  • Edit the list of stage libraries set in the SDC_STAGE_LIBS env var. Make sure to include the streamsets-datacollector-mapr_6_1-lib and streamsets-datacollector-mapr_6_1-mep6-lib stage libs. I've included the streamsets-datacollector-jython_2_7-lib library as an example.

  • Edit the list of Enterprise stage libraries set in the SDC_ENTERPRISE_STAGE_LIBS env var. I've included the streamsets-datacollector-snowflake-lib-1.5.0 as an example.

Execute the build.sh script to build the image and push it to DockerHub.

Create Kubernetes Secrets for a MapR Service Ticket and Truststore

Generate or obtain a "long lived" MapR service ticket and place it in the sdc-mapr-k8s/resources directory (not in the similarly named Resources directory in the sdc-mapr-docker directory). The file should have the name longlived_ticket.

Execute the script create-mapr-ticket-secret.sh to create a Secret for the ticket.

You must provide an ssl_truststore file for the MapR client. If you are connecting to a cluster deployed with self-signed certs, place a MapR ssl_truststore file in the sdc-mapr-k8s/resources directory. If not, you can use a copy of a JDK cacerts file renamed to ssl_truststore.

Execute the script create-mapr-truststore-secret.sh to create a Secret for the Truststore.

Set Deployment Properties

Edit the file sdc-mapr-dep.yaml and set the MAPR_CLIENT_CONFIG environment variable with the value needed to configure the MapR client for the target cluster. The value is passed to the /opt/mapr/server/configure.sh command when the Container starts up. For example, in my environment I use the string "-N mark.mapr -c -secure -C 10.10.60.182:7222".

Also set a value for the SDC_CONF_SDC_BASE_HTTP_URL environment variable for Control Hub based deployment

Launch the Deployment

Launch the Deployment with Control Hub and Control Agent, using the sdc-mapr-dep.yaml. A sdc-mapr-svc.yaml is included for a NodePort Service if needed. Ingress could also be configured if desired.

Run a Pipeline

If all goes well you should be able to reach the newly deployed SDC's UI. No additional steps are needed to authenticate to MapR as the MapR client is initialized and the service ticket is in the expected location.

Create a pipeline that reads from MapRfs as a test, like this:

pipeline

Preview the pipeline to inspect the data being read:

preview

Run the pipeline:

read-from-mapr

Write to MapRfs:

write-to-mapr

Implementation Notes