acryldata/datahub-helm

Support ingestion recipes from secrets

7onn opened this issue · 12 comments

7onn commented

Is your feature request related to a problem? Please describe.
The problem is that ingestion recipes are currently loaded from ConfigMaps even though they often contain sensitive values. This makes life a bit harder to manage these files via code.

Describe the solution you'd like
I'd like DataHub's Helm Chart to also support the usage of secrets as source of the ingestion recipe.

Describe alternatives you've considered
I suppose I could use an init container with an env var and then rewrite the file mounted by the ConfigMap with envsubst. I am also aware that DataHub provides its own secrets management solution but this is not the best approach for us as we have another secrets provider in place and we load secrets from there via external secrets.

Additional context
n/a

Notes
I am interested in developing the feature myself. I just wait for some position from the maintainers regarding this proposal.

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

This issue was closed because it has been inactive for 30 days since being marked as stale.

I see no issue with supporting this configuration using secrets, if you'd like to contribute this I think we would be open to accepting it :)

7onn commented

Great! I'll work on it and let you know later!

This makes life a bit harder to manage these files via code.

Hi
Thx for the contribution.
For me preferable download secrets from the hashicorp vault directly.
Would you like to create such functionality?

7onn commented

download secrets from the hashicorp vault directly

@alplatonov Vault supports annotations to directly set the secret values as env vars in the container. Is that what you want? e.g. https://stackoverflow.com/questions/61239479/injecting-vault-secrets-into-kubernetes-pod-environment-variable

In my case, I have API access enabled and therefore every ingestion needs a token. I store the token in AWS Secrets Manager and mount them as Kubernetes secrets via ExternalSecret to datahub-secrets. So I did the following workaround. I used an initContainer to run envsubst and copy the replaced ingestion values to the actual ingestion container:

        datahub-ingestion-cron:
          enabled: true
          image:
            repository: acryldata/datahub-ingestion
            tag: "v0.11.0.2"
          crons:
            dbt:
              # Daily, at 4:30am
              schedule: "30 4 * * *"
              recipe:
                configmapName: ingestion-dbt
                fileName: pipeline.yml
              serviceAccountName: dbt-irsa
              command: |
                datahub ingest -c /etc/ingestion/dbt.yml
              extraVolumes:
                - name: ingestion
                  emptyDir: {}
              extraVolumeMounts:
                - name: ingestion
                  mountPath: /etc/ingestion
              extraInitContainers:
                - name: recipe-rewriter
                  image: bhgedigital/envsubst:v1.0-alpine3.6
                  env:
                    - name: "INGESTION_TOKEN"
                      valueFrom:
                        secretKeyRef:
                          name: "datahub-secrets"
                          key: "ingestion_token"
                  command: ['sh', '-c', 'envsubst < /etc/recipes/pipeline.yml > /etc/ingestion/dbt.yml']
                  volumeMounts:
                    - name: recipe
                      mountPath: /etc/recipes
                    - name: ingestion
                      mountPath: /etc/ingestion

The actual ingestion (commited to git) looks like:

apiVersion: v1
kind: ConfigMap
metadata:
  name: ingestion-dbt
data:
  pipeline.yml: |
    pipeline_name: dbt
    sink:
      type: datahub-rest
      config:
        server: http://datahub-datahub-gms:8080
        token: ${INGESTION_TOKEN}
    source:
      type: dbt
      config:
        aws_connection:
...

My idea of feature is to support this via Helm Chart i.e. an extraInitContainer pulling envsubst and rewriting the recipe for the ingestion workload. So, instead of setting the env.[*].valueFrom in the initContainer, you would be able to just annotate e.g.

vault.hashicorp.com/agent-inject-template-config: |
          {{ with secret "secret/data/mysecret" -}}
            export MY_SECRET="{{ .Data.data.MY_SECRET.MY_SECRET_KEY }}"
          {{- end }}

in the init container and then envsubst would replace the sensitive value for you in runtime before DataHub tries to fetch its internal secrets from its own store. Wdyt?

Feel free to copy my workaround for yourself while this is not officially supported <3

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

7onn commented

don't stale it yet

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

7onn commented

how i miss having time for this =(

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

This issue was closed because it has been inactive for 30 days since being marked as stale.