kubernetes/examples

fsGroup securityContext does not apply to nfs mount

kmarokas opened this issue ยท 65 comments

The example https://github.com/kubernetes/examples/tree/master/staging/volumes/nfs works fine if the container using nfs mount is running as root user. If I use securityContext to run not as root user then I have no write access to the mounted volume.

How to reproduce:
here is the nfs-busybox-rc.yaml with securityContext:

# This mounts the nfs volume claim into /mnt and continuously
# overwrites /mnt/index.html with the time and hostname of the pod.

apiVersion: v1
kind: ReplicationController
metadata:
  name: nfs-busybox
spec:
  replicas: 2
  selector:
    name: nfs-busybox
  template:
    metadata:
      labels:
        name: nfs-busybox
    spec:
      securityContext:
        runAsUser: 10000
        fsGroup: 10000
      containers:
      - image: busybox
        command:
          - sh
          - -c
          - 'while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done'
        imagePullPolicy: IfNotPresent
        name: busybox
        securityContext:
          runAsUser: 10000
        volumeMounts:
          # name must match the volume name below
          - name: nfs
            mountPath: "/mnt"
      volumes:
      - name: nfs
        persistentVolumeClaim:
          claimName: nfs

Actual result:

kubectl exec nfs-busybox-2w9bp -t -- id
uid=10000 gid=0(root) groups=10000

kubectl exec nfs-busybox-2w9bp -t -- ls -l /
total 48
<..>
drwxr-xr-x    3 root     root          4096 Aug  2 12:27 mnt

Expected result:
the group ownership of /mnt folder should be user 10000

The mount options in nfs pv are not allowed except rw

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteMany
  nfs:
    # FIXME: use the right IP
    server: 10.23.137.115
    path: "/"
  mountOptions:
#    - rw // is allowed
#    - root_squash // error during pod scheduling: mount.nfs: an incorrect mount option was specified
#    - all_squash // error during pod scheduling: mount.nfs: an incorrect mount option was specified
#    - anonuid=10000 // error during pod scheduling: mount.nfs: an incorrect mount option was specified
#    - anongid=10000 // error during pod scheduling: mount.nfs: an incorrect mount option was specified
kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.3-rancher1", GitCommit:"f6320ca7027d8244abb6216fbdb73a2b3eb2f4f9", GitTreeState:"clean", BuildDate:"2018-05-29T22:28:56Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Why did this get closed with no resolution? I have this same issue. If there is a better solution than an init container please someone fill me in.

Yeah... I'm having the same issue with NFS too. securityContext.fsGroup seems to have no affect on NFS volume mounts, so you kinda have to use the initContainer approach :(

I'm having the same problem.

same issue able to write but not able to read from nfs mounted volume . kubernetes shows success in mounting process but no luck .

/reopen

@varun-da: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/reopen

@kmarokas: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

thanks @kmarokas!

/remove-lifecycle rotten

Would love for this to be addressed! In the mean time here's how we're dealing with it...

In this example there are two pods that are mounting an AWS EFS volume via nfs. To enable a non-root user, we make the mount point accessible via an initContainer.

---
apiVersion: v1
kind: Pod
metadata:
  name: alpine-efs-1
  labels:
    name: alpine
spec:
  volumes:
  - name: nfs-test
    nfs:
      server: fs-xxxxxxxx.efs.us-east-1.amazonaws.com
      path: /
  securityContext:
    fsGroup: 100
    runAsGroup: 100
    runAsUser: 405
  initContainers:
    - name: nfs-fixer
      image: alpine
      securityContext:
        runAsUser: 0
      volumeMounts:
      - name: nfs-test
        mountPath: /nfs
      command:
      - sh
      - -c
      - (chmod 0775 /nfs; chgrp 100 /nfs)
  containers:
  - name: alpine
    image: alpine
    volumeMounts:
      - name: nfs-test
        mountPath: /nfs
    command:
      - tail
      - -f
      - /dev/null
---
apiVersion: v1
kind: Pod
metadata:
  name: alpine-efs-2
  labels:
    name: alpine
spec:
  volumes:
  - name: nfs-test
    nfs:
      server: fs-xxxxxxxx.efs.us-east-1.amazonaws.com
      path: /
  securityContext:
    supplementalGroups:
      - 100
    fsGroup: 100
    # runAsGroup: 100
    runAsUser: 405
  initContainers:
    - name: nfs-fixer
      image: alpine
      securityContext:
        runAsUser: 0
      volumeMounts:
      - name: nfs-test
        mountPath: /nfs
      command:
      - sh
      - -c
      - (chmod 0775 /nfs; chgrp 100 /nfs)
  containers:
  - name: alpine
    image: alpine
    volumeMounts:
      - name: nfs-test
        mountPath: /nfs
    command:
      - tail
      - -f
      - /dev/null

The same seems to be true for cifs mounts created through a custom volume driver: juliohm1978/kubernetes-cifs-volumedriver#8

Edit: Looks like there is very little magic that Kubernetes does when mounting the volumes. The individual volume drivers have to respect the fsGroup configuration set in the pod. Looks like the NFS provider doesn't do that as of now.

Is https://github.com/kubernetes-incubator/external-storage/tree/master/nfs-client the place where this could be fixed?

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

no solution since around 1 1/2 years? cant believe it.

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/remove-lifecycle rotten

Maybe this issue needs to be taken to another repository. Is https://github.com/kubernetes-incubator/external-storage the right place for it?

https://kubernetes.io/docs/tasks/configure-pod-container/security-context/

fsGroupChangePolicy: "Always"

refer the above link. But it seems that the feature is available only from k8-1.18 version.
Guess if i'm not wrong.

fsGroupChangePolicy: "Always"

The docs are not totally clear about this, but I understand that this is already the default behaviour.

By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted.

The section also indicates that not every volume type necessarily supports changing permissions:

This field only applies to volume types that support fsGroup controlled ownership and permissions.

+1

The same issue for AWS EBS gp2 volumes

+1

I just ran into this issue today as well. Is there any workaround yet besides using an initContainer?

euven commented

+1 - facing this issue too!

+1 - facing this issue

  • block storage (eg: iSCSI, Ceph RBD, .. ) : use fsGroup to control access
  • shared storage (e.g: NFS, GlusterFS) : use supplementGroups instead

Give me like if i saved your day

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

/remove-lifecycle stale

Has anyone been able to do this without an init container? Really hoping to avoid that if possible.

I would rather avoid init containers, in fact I don't like any kind of scripts in the k8s manifests!
So If you don't want to use init container, you can do something like this -

e.g. for nfs volumes (This assumes you have control over nfs server)
On the nfs server have something like this in /srv/exportfs
/srv/vol1 *(rw,sync,all_squash,insecure,no_subtree_check,anonuid=2000,anongid=3000)
Create /srv/vol1 to match above

sudo chmod -R 2000:3000 /srv/vol1
sudo chmod -R 775 /srv/vol1

In the pod use security context to match above -

  securityContext:
    runAsUser: 2000
    runAsGroup: 2000
    fsGroup: 3000
    fsGroupChangePolicy: Always

I like this better than allowing to run as root in init containers. (PodSecurityPolicy may prevent it as well)
Also this is a better way for dishing out volumes with well known uid:gid that anyone can predictably use.

e.g. you can also use the same technique with hostPath based volumes.
On a k8s host have a dir e.g. /data/vol1with matching permissions -

sudo chown -R 2000:3000 /data/vol1
sudo chmod -R 775 /data/vol1
ls -l /data/
total 4
drwxrwxr-x 2 2000 3000 4096 Jul  3 01:51 vol1

Alternatively if you want to use commercial persistent volumes like aws/gcp/portworxVolume etc, it will depend on whether they support fsGroup.

I disabled all sudo privileges from pod users for security reasons.
So I can't configure the privilege of the mount point because Kubernetes won't let me,
and I can't chown/chmod the mount point because my pod user can't sudo.
How do I solve this problem?

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/remove-lifecycle stale

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

+1 - facing this issue

/remove-lifecycle stale

hit this issue. For a POC I just attached my share to a VM and manually chowned, but for prod that's probably not ok.

Just recently set up a cluster on Linode and I can't believe it, feels incomplete. I know this has nothing to do with Linode but I just want to add some context. The primary way to mount a PVC through Linode is to buy their volumes. Which require a minimum of 10GB and can only be up to 8/linode. I thought I was smart when I found the Rook NFS workaround. Everything would've been perfect till none of my databases could be provisioned because I kept getting a permission denied error. Looking deeper into it, I came across this issue. Because I am using a postgres operator (tried kubegres and PGO), there doesn't seem to be a possibility to specify an init container. This means, that every time I provision a database (or a replica). I need to shell into my Linode, find the PVC and manually change the permissions. I really appreciate the work that the community has done towards kubernetes and that fact that it is FOSS, but this really seems to be an enormous issue that is being completely ignored.

true, this is something that should be fixed ๐Ÿ‘

Looks like it is working for me (specifying all of runAsUser, runAsGroup and fsGroup) (version 1.24.1)

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/remove-lifecycle stale

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Anyone else can confirm what @ramihoudroge said ? that 1.24.1 works ?

I've also found this thread https://devops.stackexchange.com/questions/13939/how-to-allow-a-non-root-user-to-write-to-a-mounted-efs-in-eks which mention EFS access point.
Anyone had success with this ?

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Also have this issue with permission denied. With a mongodb container nfs mounting to an EFS in AWS.
Using EKS 1.24
AWS EFS
https://stackoverflow.com/questions/75670387/error-executing-postinstallation-eperm-operation-not-permitted-utime-bitn

/remove-lifecycle rotten

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

vmule commented

/remove-lifecycle rotten

I ran into this exact issue with a statical PV using the default mount nfs.
There is no possible mount options for nfs to change the nfs permission. securityContext.fsGroup setting is ignored without any outputs.
Unfortunately, the initContainer approach is not an option for me.
You can do something about this issue?

@yingding have you found any workaround?

@radirobi97 If you can use initContiners approach #260 (comment) , it will work.
I had this issue still with a pod from ML system which I do not have control over. Ultimately, i switched to object store and gave up on the default nfs mount of static PV.
But I think dynamic nfs CSI driver shall not have this static PV issue.

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

/remove-lifecycle rotten

/reopen

@rmunn: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.