Goofys on K8s - Error While Creating Mount Source Path: File Exists
Opened this issue · 0 comments
Hi All,
I've been stuck with this issue for a handful of days now and now sure how to fix it. I am using this image on K8s. Everything works well but then after some period of time (usually a few hours) I run into the same issue.
Here is the pod/container status that I end up with:
lastState:
terminated:
containerID: docker://58cbfada1b7ef57e87dc3aa2e459f3e80a980801420d831e90d88b3bf8d76d01
exitCode: 128
finishedAt: "2021-02-09T13:31:05Z"
message: 'error while creating mount source path ''/var/lib/kubelet/pods/be45b49d-0efb-4193-b10e-ce3d5c888f8c/volumes/kubernetes.io~empty-dir/full-logs-mount'':
mkdir /var/lib/kubelet/pods/be45b49d-0efb-4193-b10e-ce3d5c888f8c/volumes/kubernetes.io~empty-dir/full-logs-mount:
file exists'
reason: ContainerCannotRun
Here is my k8s spec:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: my-data-pipeline
namespace: default
spec:
selector:
matchLabels:
app.kubernetes.io/name: my-data-pipeline
app.kubernetes.io/instance: my-data-pipeline
serviceName: my-data-pipeline-hs
podManagementPolicy:
replicas: 1
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app.kubernetes.io/name: my-data-pipeline
app.kubernetes.io/instance: my-data-pipeline
spec:
serviceAccountName:
containers:
- name: goofys-full-logs
image: "cloudposse/goofys:0.4.0"
imagePullPolicy: Always
securityContext:
privileged: true
runAsUser: 0
env:
- name: BUCKET
value: my-logs.bucket.com
- name: MOUNT_DIR
value: /mnt/s3/full_logs
- name: REGION
value: us-east-1
- name: DIR_MODE
value: "0777"
- name: FILE_MODE
value: "0777"
volumeMounts:
- name: full-logs-mount
mountPath: /mnt/s3/full_logs
mountPropagation: Bidirectional
resources:
limits:
cpu: "2"
memory: 8Gi
requests:
cpu: "1"
memory: 6Gi
# lifecycle:
# preStop:
# exec:
# command: ["/bin/sh","-c","umount -f /mnt/s3/full_logs"]
- name: goofys-archive-logs
image: "cloudposse/goofys:0.4.0"
imagePullPolicy: Always
securityContext:
privileged: true
runAsUser: 0
env:
- name: BUCKET
value: my-data.bucket.com
- name: MOUNT_DIR
value: /mnt/s3/archived_logs
- name: REGION
value: us-east-1
- name: DIR_MODE
value: "0777"
- name: FILE_MODE
value: "0777"
volumeMounts:
- name: archived-logs-mount
mountPath: /mnt/s3/archived_logs
mountPropagation: Bidirectional
resources:
limits:
cpu: "2"
memory: 8Gi
requests:
cpu: "1"
memory: 6Gi
# lifecycle:
# preStop:
# exec:
# command: ["/bin/sh","-c","umount -f /mnt/s3/archived_logs"]
- name: pipeline-runner
securityContext:
privileged: true
runAsUser: 0
image: "my-data-service:1.0.0"
imagePullPolicy: Always
command:
env:
- name: FULL_LOGS_MOUNT_DIR
value: /mnt/full_logs
- name: ARCHIVED_LOGS_MOUNT_DIR
value: /mnt/archived_logs
- name: PIPELINE_WORKING_DIR
value: /mnt/pipeline_working_disk
volumeMounts:
- name: full-logs-mount
mountPath: /mnt/full_logs
mountPropagation: Bidirectional
- name: archived-logs-mount
mountPath: /mnt/archived_logs
mountPropagation: Bidirectional
- name: pipeline-working-disk
mountPath: /mnt/pipeline_working_disk
resources:
limits:
cpu: "2"
memory: 3Gi
requests:
cpu: "1"
memory: 2Gi
volumes:
- name: full-logs-mount
emptyDir: {}
- name: archived-logs-mount
emptyDir: {}
restartPolicy: Always
nodeSelector:
workload: log-pipeline
volumeClaimTemplates:
- metadata:
name: pipeline-working-disk
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: "200Gi"
Essentially there are 2 buckets that need to be mounted by the pipeline runner. 1 bucket is read from, that data is processed, and then the other is written to. The bucket that is read from is the one that consistently fails after some period of time with the same error above. I'm not sure what causes it or why. It works well for a few hours maybe, but it always ends up in the same crashed state.
Also, the bucket that it fails on has a LOT of data that goes back multiple years (thousands of TB's). Not sure if that has anything to do with it or not.
This may be a https://github.com/kahing/goofys/issues thing, but thought I would start here since this is where the image came from (and the example for k8s implementation). Any help is appreciated!