This project builds a custom Docker image of StreamSets Data Collector (SDC) with MapR v6.1 client and Kubernetes deployment support.
Switch to the sdc-mapr-docker
directory.
Edit build.sh
and make these changes:
-
Set the image name
-
Edit the list of stage libraries set in the
SDC_STAGE_LIBS
env var. Make sure to include thestreamsets-datacollector-mapr_6_1-lib
andstreamsets-datacollector-mapr_6_1-mep6-lib
stage libs. I've included thestreamsets-datacollector-jython_2_7-lib
library as an example. -
Edit the list of Enterprise stage libraries set in the
SDC_ENTERPRISE_STAGE_LIBS
env var. I've included thestreamsets-datacollector-snowflake-lib-1.5.0
as an example.
Execute the build.sh
script to build the image and push it to DockerHub.
Generate or obtain a "long lived" MapR service ticket and place it in the sdc-mapr-k8s/resources
directory (not in the similarly named Resources
directory in the sdc-mapr-docker
directory). The file should have the name longlived_ticket
.
Execute the script create-mapr-ticket-secret.sh
to create a Secret for the ticket.
You must provide an ssl_truststore
file for the MapR client. If you are connecting to a cluster deployed with self-signed certs, place a MapR ssl_truststore
file in the sdc-mapr-k8s/resources
directory. If not, you can use a copy of a JDK cacerts
file renamed to ssl_truststore
.
Execute the script create-mapr-truststore-secret.sh
to create a Secret for the Truststore.
Edit the file sdc-mapr-dep.yaml
and set the MAPR_CLIENT_CONFIG
environment variable with the value needed to configure the MapR client for the target cluster. The value is passed to the /opt/mapr/server/configure.sh
command when the Container starts up. For example, in my environment I use the string "-N mark.mapr -c -secure -C 10.10.60.182:7222"
.
Also set a value for the SDC_CONF_SDC_BASE_HTTP_URL
environment variable for Control Hub based deployment
Launch the Deployment with Control Hub and Control Agent, using the sdc-mapr-dep.yaml
. A sdc-mapr-svc.yaml
is included for a NodePort Service if needed. Ingress could also be configured if desired.
If all goes well you should be able to reach the newly deployed SDC's UI. No additional steps are needed to authenticate to MapR as the MapR client is initialized and the service ticket is in the expected location.
Create a pipeline that reads from MapRfs as a test, like this:
Preview the pipeline to inspect the data being read:
Run the pipeline:
Write to MapRfs:
-
StreamSets' setup-mapr command is called at build time by the Docker Container's sdc-configure.sh script.
-
MapR's configure.sh script is called at runtime from the docker-entrypoint.sh script using the MapR cluster's name and URL set in the deployment manifest.