cloudposse-archives/goofys

Goofys on K8s - Error While Creating Mount Source Path: File Exists

Opened this issue · 0 comments

Hi All,

I've been stuck with this issue for a handful of days now and now sure how to fix it. I am using this image on K8s. Everything works well but then after some period of time (usually a few hours) I run into the same issue.

Here is the pod/container status that I end up with:

 lastState:
      terminated:
        containerID: docker://58cbfada1b7ef57e87dc3aa2e459f3e80a980801420d831e90d88b3bf8d76d01
        exitCode: 128
        finishedAt: "2021-02-09T13:31:05Z"
        message: 'error while creating mount source path ''/var/lib/kubelet/pods/be45b49d-0efb-4193-b10e-ce3d5c888f8c/volumes/kubernetes.io~empty-dir/full-logs-mount'':
          mkdir /var/lib/kubelet/pods/be45b49d-0efb-4193-b10e-ce3d5c888f8c/volumes/kubernetes.io~empty-dir/full-logs-mount:
          file exists'
        reason: ContainerCannotRun

Here is my k8s spec:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: my-data-pipeline
  namespace: default
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: my-data-pipeline
      app.kubernetes.io/instance: my-data-pipeline
  serviceName: my-data-pipeline-hs
  podManagementPolicy: 
  replicas: 1
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app.kubernetes.io/name: my-data-pipeline
        app.kubernetes.io/instance: my-data-pipeline
    spec:
      serviceAccountName: 
      containers:
      - name: goofys-full-logs
        image: "cloudposse/goofys:0.4.0"
        imagePullPolicy: Always
        securityContext:
          privileged: true
          runAsUser: 0
        env:
        - name: BUCKET
          value: my-logs.bucket.com
        - name: MOUNT_DIR
          value: /mnt/s3/full_logs
        - name: REGION
          value: us-east-1
        - name: DIR_MODE
          value: "0777"
        - name: FILE_MODE
          value: "0777"
        volumeMounts:
        - name: full-logs-mount
          mountPath: /mnt/s3/full_logs
          mountPropagation: Bidirectional
        resources:
          limits:
            cpu: "2"
            memory: 8Gi
          requests:
            cpu: "1"
            memory: 6Gi
        # lifecycle:
        #   preStop:
        #     exec:
        #       command: ["/bin/sh","-c","umount -f /mnt/s3/full_logs"]
      - name: goofys-archive-logs
        image: "cloudposse/goofys:0.4.0"
        imagePullPolicy: Always
        securityContext:
          privileged: true
          runAsUser: 0
        env:
        - name: BUCKET
          value: my-data.bucket.com
        - name: MOUNT_DIR
          value: /mnt/s3/archived_logs
        - name: REGION
          value: us-east-1
        - name: DIR_MODE
          value: "0777"
        - name: FILE_MODE
          value: "0777"
        volumeMounts:
        - name: archived-logs-mount
          mountPath: /mnt/s3/archived_logs
          mountPropagation: Bidirectional
        resources:
          limits:
            cpu: "2"
            memory: 8Gi
          requests:
            cpu: "1"
            memory: 6Gi
        # lifecycle:
        #   preStop:
        #     exec:
        #       command: ["/bin/sh","-c","umount -f /mnt/s3/archived_logs"]
      - name: pipeline-runner
        securityContext:
          privileged: true
          runAsUser: 0
        image: "my-data-service:1.0.0"
        imagePullPolicy: Always
        command:
        env:
        - name: FULL_LOGS_MOUNT_DIR
          value: /mnt/full_logs
        - name: ARCHIVED_LOGS_MOUNT_DIR
          value: /mnt/archived_logs
        - name: PIPELINE_WORKING_DIR
          value: /mnt/pipeline_working_disk
        volumeMounts:
        - name: full-logs-mount
          mountPath: /mnt/full_logs
          mountPropagation: Bidirectional
        - name: archived-logs-mount
          mountPath: /mnt/archived_logs
          mountPropagation: Bidirectional
        - name: pipeline-working-disk
          mountPath: /mnt/pipeline_working_disk
        resources:
          limits:
            cpu: "2"
            memory: 3Gi
          requests:
            cpu: "1"
            memory: 2Gi
      volumes:
      - name: full-logs-mount
        emptyDir: {}
      - name: archived-logs-mount
        emptyDir: {}
      restartPolicy: Always
      nodeSelector:
        workload: log-pipeline
  volumeClaimTemplates:
    - metadata:
        name: pipeline-working-disk
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: "200Gi"

Essentially there are 2 buckets that need to be mounted by the pipeline runner. 1 bucket is read from, that data is processed, and then the other is written to. The bucket that is read from is the one that consistently fails after some period of time with the same error above. I'm not sure what causes it or why. It works well for a few hours maybe, but it always ends up in the same crashed state.

Also, the bucket that it fails on has a LOT of data that goes back multiple years (thousands of TB's). Not sure if that has anything to do with it or not.

This may be a https://github.com/kahing/goofys/issues thing, but thought I would start here since this is where the image came from (and the example for k8s implementation). Any help is appreciated!