Airflow Web Server does not include config map for kubernetes pod template config map
akash-jain-10 opened this issue · 5 comments
Checks
- I have checked for existing issues.
- This report is about the
User-Community Airflow Helm Chart
.
Chart Version
8.6.1
Kubernetes Version
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:14:41Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-15T13:33:12Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/arm64"}
Helm Version
version.BuildInfo{Version:"v3.10.2", GitCommit:"50f003e5ee8704ec937a756c646870227d7c8b58", GitTreeState:"clean", GoVersion:"go1.19.3"}
Description
We are using a custom Airflow plugin that helps us dynamically deploy DAGs via REST APIs.
During the deployment process, we are triggering the SchedulerJob to force scanning the new created DAGs from python code. This is a preferred solution, rather than managing the scan interval time.
However, when running the deployments in k8s with KubernetesExecutor, the new DAGs are not picking up the right pod_template_file. This is because the pod_template_file is only mounted in the Scheduler pod, and not on the Webserver pod, where the plugins run.
If we manually mount the pod_template_file in the right location AIRFLOW__KUBERNETES_EXECUTOR__POD_TEMPLATE_FILE
, then everything works as expected.
This is the chart adding the pod_template_file in the scheduler, but this entry is missing from the webserver chart. However, the official Ariflow charts are indeed adding the pod_template_file to the webserver as well.
Expected Behaviour -
For Plugins that runs on Airflow Web Servers and create DAGs on the fly (programmatically), It should pick up the right pod template file with correct serviceAccount name and dags and logs directory mounted.
Actual Behaviour -
New Kubernetes Pod that spins up for DAGs generated via plugin running on Airflow webServer spins up with default service account name and does not have airflow environment variables (it does not use the pod_template.yaml file associated with scheduler pod).
Alternatives/workarounds -
Currently, we did manage to get a static work around available by passing extraVolumes
and extraVolumeMounts
for web server but since the name of ConfigMap is derived from helm release name, this is not a suggested work around to get things going!
It would be great to take a look here and add the ConfigMap baked in with the helm charts.
Relevant Logs
No logs to detect this! Just trial and error.
Custom Helm Values
airflow:
image:
repository: docker.getcollate.io/openmetadata/ingestion
tag: 1.0.0
pullPolicy: "IfNotPresent"
executor: "KubernetesExecutor"
config:
# This is required for OpenMetadata UI to fetch status of DAGs
AIRFLOW__API__AUTH_BACKENDS: airflow.api.auth.backend.basic_auth
# OpenMetadata Airflow Apis Plugin DAGs Configuration
AIRFLOW__OPENMETADATA_AIRFLOW_APIS__DAG_GENERATED_CONFIGS: "/opt/airflow/dags"
# OpenMetadata Airflow Secrets Manager Configuration
AIRFLOW__OPENMETADATA_SECRETS_MANAGER__AWS_REGION: ""
AIRFLOW__OPENMETADATA_SECRETS_MANAGER__AWS_ACCESS_KEY_ID: ""
AIRFLOW__OPENMETADATA_SECRETS_MANAGER__AWS_ACCESS_KEY: ""
users:
- username: admin
password: admin
role: Admin
email: spiderman@superhero.org
firstName: Peter
lastName: Parker
web:
readinessProbe:
enabled: true
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 10
livenessProbe:
enabled: true
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 10
postgresql:
enabled: false
workers:
enabled: false
flower:
enabled: false
redis:
enabled: false
externalDatabase:
type: mysql
host: mysql
port: 3306
database: airflow_db
user: airflow_user
passwordSecret: airflow-mysql-secrets
passwordSecretKey: airflow-mysql-password
serviceAccount:
create: true
name: "airflow"
scheduler:
logCleanup:
enabled: false
dags:
persistence:
enabled: true
# NOTE: "" means cluster-default
storageClass: ""
size: 1Gi
accessMode: ReadWriteMany
logs:
persistence:
enabled: true
# empty string means cluster-default
storageClass: ""
accessMode: ReadWriteMany
size: 1Gi
This issue has been automatically marked as stale because it has not had activity in 60 days.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.
Issues never become stale if any of the following is true:
- they are added to a Project
- they are added to a Milestone
- they have the
lifecycle/frozen
label
Hey Team, want to follow up here! Is this something that can be added as part of enhancements to Airflow-Helm Community Charts ?
This issue has been automatically marked as stale because it has not had activity in 60 days.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.
Issues never become stale if any of the following is true:
- they are added to a Project
- they are added to a Milestone
- they have the
lifecycle/frozen
label
@akash-jain-10 As far as I know, the existing airflow.kubernetesPodTemplate.*
values work correctly to template the AIRFLOW__KUBERNETES__POD_TEMPLATE_FILE
file.
We also provide the airflow.kubernetesPodTemplate.stringOverride
value to override the full template with a custom string value, if required.
Hello @thesuperzapper - The kubernetesPodTemplate seems to be only mounted with scheduler and not with webserver. Is this intentional ? Any custom Plugin that relies on the kubernetes Pod Template on the webserver pod seems to be failing for us!