acryldata/datahub-actions

Permission error while accessing /tmp/datahub

upendrao opened this issue · 6 comments

Problem description:
datahub-actions component fails to process incoming ingestion requests as it fails to create ingestion recipe yaml file at /tmp/datahub/ingest folder

Environment:

  • datahub helm chart v0.2.144
  • Datahub 0.9.6.1
  • datahub-actions v0.0.8
  • Kubernetes v1.26.0

How to reproduce?
Run datahub-actions in k8s v1.26.0 with following securityContext configuration.

acryl-datahub-actions:
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
        - ALL
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: "RuntimeDefault"

Observations:

  • Pod security context enforces the pods in k8s v1.26.0 requires all pods to be runAsNonRoot: true which implies one had to specify a user-id in securityContext
  • The /tmp/datahub folder is created and owned by a system user 'datahub' as per Dockerfile here
  • datahub-actions startup script is started by the user with id 1000 which overrides the intended USER datahub specified in Dockerfile
  • So all datahub-actions scripts cannot access /tmp/datahub folder

Questions?

  • What was the need for a special datahub system user?
  • Why do you need to protect /tmp/datahub for datahub user

We store logs and venv setups in /tmp/datahub during ingestion execution.

We don't have any specific requirements around the datahub system user vs any other setup, but generally wanted to run the processes as non-root and still have that non-root user be able to access the necessary locations on disk.

Would it be possible for you to run as the user id of datahub?

It is not possible to specify non-numeric user-id according to securityContext spec
Here is the error upon not providing a user-id to the container
Warning Failed 2s (x2 over 3s) kubelet Error: container has runAsNonRoot and image has non-numeric user (datahub), cannot verify user is non-root (pod: "datahub-acryl-datahub-actions-7b48fdf684-6bzc9_datahub(d73d1648-99d5-453f-a244-91ac1520db36)", container: acryl-datahub-actions)
And it is strictly recommended to runAsNonRoot: true
Which implies that users are forced to provide a numeric user-id that may not match what datahub user transpires to on the container.

$ id 100
uid=100(_apt) gid=65534(nogroup) groups=65534(nogroup)
$ id 101
uid=101(messagebus) gid=102(messagebus) groups=102(messagebus)
$ id 102
uid=102(datahub) gid=103(datahub) groups=103(datahub)

Following your explanation I see that there is no need to restrict datahub user to restrict access to /tmp/datahub folder.
Allowing everyone to read/write to this location would resolve this issue.

Given that datahub is uid 102, would it be possible to set runAsUser: 102?

Assuming datahub user id is just a workaround that I tested.
But I don't think that is a solution as we can rely upon it as it was 100 in a previous release and now 102.

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

This issue was closed because it has been inactive for 30 days since being marked as stale.