Unable to write logs to s3
Closed this issue · 1 comments
Checks
- I have checked for existing issues.
- This report is about the
User-Community Airflow Helm Chart
.
Chart Version
8.7.1
Kubernetes Version
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:51:43Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:47:40Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Helm Version
version.BuildInfo{Version:"v3.12.3", GitCommit:"3a31588ad33fe3b89af5a2a54ee1d25bfe6eaa5e", GitTreeState:"clean", GoVersion:"go1.20.7"}
Description
We used to send logs directly to S3. However, now we've decided to send them to the standard directory /opt/airflow/logs by mounting our PVC, which in turn points to an S3 bucket like it mentioned here. Since logs from various DAGs are often not written, we decided that addressing log loss might be resolved by changing and testing a different logging algorithm described here.
After applying these changes, our airflow-db-migrations pod starts, and within it, there is a check-db container that fails with a PermissionDenied error for /opt/airflow/logs/scheduler.
I'm attaching the complete log below.
If we revert everything back to how it was before and connect as described here S3 Bucket, everything starts correctly, and all pods come up without errors.
I'm providing all our configurations for PVC/PV and Airflow below.
Please, can you advise where the error might be? Thank you in advance!
PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: airflow-logs
namespace: airflow
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
storageClassName: csi-s3
volumeName: airflow-logs
PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: airflow-logs
namespace: airflow
spec:
storageClassName: csi-s3
capacity:
storage: 50Gi
accessModes:
- ReadWriteMany
csi:
driver: ru.yandex.s3.csi
volumeHandle: bucket/airflow/logs
controllerPublishSecretRef:
name: csi-s3-secret
namespace: airflow
nodePublishSecretRef:
name: csi-s3-secret
namespace: airflow
nodeStageSecretRef:
name: csi-s3-secret
namespace: airflow
volumeAttributes:
capacity: 1Gi
mounter: geesefs
Relevant Logs
Unable to load the config, contains a configuration error.Traceback (most recent call last): File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir self._accessor.mkdir(self, mode)FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/logs/scheduler/2024-02-01'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/lib/python3.8/logging/config.py", line 563, in configure handler = self.configure_handler(handlers[name]) File "/usr/local/lib/python3.8/logging/config.py", line 744, in configure_handler result = factory(**kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/log/file_processor_handler.py", line 49, in __init__ Path(self._get_log_directory()).mkdir(parents=True, exist_ok=True) File "/usr/local/lib/python3.8/pathlib.py", line 1292, in mkdir self.parent.mkdir(parents=True, exist_ok=True) File "/usr/local/lib/python3.8/pathlib.py", line 1288, in mkdir self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler'The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/airflow/.local/bin/airflow", line 5, in <module> from airflow.__main__ import main File "/home/airflow/.local/lib/python3.8/site-packages/airflow/__init__.py", line 64, in <module> settings.initialize()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/settings.py", line 570, in initialize LOGGING_CLASS_PATH = configure_logging()
Custom Helm Values
airflow:
legacyCommands: false
executor: CeleryKubernetesExecutor
image:
repository: cr.yandex/crp1uvj38k3uhag59uoq/airflow-2.5.3-python3.8
tag: mars-image-0.1.8
defaultNodeSelector:
custom.yandex.cloud/node-group-name: platform
config:
AIRFLOW__CELERY__FLOWER_URL_PREFIX: /airflow/flower
AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT: 120
AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT: 100
AIRFLOW__LOGGING__LOGGING_LEVEL: INFO
AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: s3://bucket
AIRFLOW__LOGGING__REMOTE_LOGGING: False
AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: yandex_s3
AIRFLOW__METRICS__STATSD_HOST: prometheus-statsd-exporter
AIRFLOW__METRICS__STATSD_ON: True
AIRFLOW__METRICS__STATSD_PORT: 9125
AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: True
AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL: 120
AIRFLOW__SECRETS__BACKEND: airflow.providers.hashicorp.secrets.vault.VaultBackend
AIRFLOW__WEBSERVER__BASE_URL: https://dev.yac.rupn.cloud-effem.com/airflow
AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX: True
extraEnv:
- name: AIRFLOW__SECRETS__BACKEND_KWARGS
valueFrom:
secretKeyRef:
key: value
name: vault-backend-kwargs
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
key: value
name: airflow-fernet-key
- name: AIRFLOW__WEBSERVER__SECRET_KEY
valueFrom:
secretKeyRef:
key: value
name: airflow-webserver-key
extraPipPackages:
- apache-airflow-providers-hashicorp==3.3.0
- hvac==1.1.0
extraVolumeMounts:
- mountPath: /opt/airflow/plugins
name: airflow-plugins
readOnly: true
- mountPath: /opt/airflow/logs
name: airflow-logs
extraVolumes:
- name: airflow-plugins
persistentVolumeClaim:
claimName: airflow-plugins
- name: airflow-logs
persistentVolumeClaim:
claimName: airflow-logs
users:
- email: ${ADMIN_EMAIL}
firstName: admin
lastName: admin
password: ${ADMIN_PASSWORD}
role: Admin
username: admin
usersTemplates:
ADMIN_EMAIL:
key: email
kind: secret
name: admin-user
ADMIN_PASSWORD:
key: password
kind: secret
name: admin-user
usersUpdate: true
dags:
path: /opt/airflow/dags
persistence:
enabled: true
existingClaim: airflow-dags
web:
enabled: true
webserverConfig:
existingSecret: airflow-webserver-config
flower:
enabled: true
ingress:
enabled: true
apiVersion: networking.k8s.io/v1
web:
annotations:
cert-manager.io/cluster-issuer: letsencrypt
host: dev.yac.rupn.cloud-effem.com
path: /airflow
ingressClassName: nginx
tls:
enabled: true
secretName: tls-secret
flower:
annotations:
cert-manager.io/cluster-issuer: letsencrypt
host: dev.yac.rupn.cloud-effem.com
path: /airflow/flower
ingressClassName: nginx
tls:
enabled: true
secretName: tls-secret
redis:
enabled: true
existingSecret: airflow-redis
existingSecretKey: redis-password
postgresql:
enabled: true
existingSecret: airflow-postgresql
existingSecretKey: postgresql-password
persistence:
enabled: true
storageClass: yc-network-ssd-nonreplicated
size: 93Gi
serviceAccount:
create: true
name: airflow
serviceMonitor:
enabled: true
selector:
prometheus: platform
scheduler:
replicas: 1
triggerer:
enabled: true
workers:
enabled: true
replicas: 1
nodeSelector:
custom.yandex.cloud/node-group-name: dev
extraVolumes:
- name: yandex-sa-secret-volume
secret:
secretName: airflow-sa-key
extraVolumeMounts:
- name: yandex-sa-secret-volume
mountPath: /etc/yc
readOnly: true
Was fixed with granted permitions in options step in pvc manifest
volumeAttributes:
options: "--memory-limit 1000 --dir-mode 0777 --file-mode 0666 --uid 50000"