pods stuck in wait-for-airflow-migrations
Closed this issue · 2 comments
espenthaem commented
Checks
- I have checked for existing issues.
- This report is about the
User-Community Airflow Helm Chart
.
Chart Version
1.13.1
Kubernetes Version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.13-eks-3af4770
WARNING: version difference between client (1.30) and server (1.27) exceeds the supported minor version skew of +/-1
Helm Version
version.BuildInfo{Version:"v3.15.1", GitCommit:"e211f2aa62992bd72586b395de50979e31231829", GitTreeState:"clean", GoVersion:"go1.22.3"}
Description
I'm trying to deploy using Airflow but my scheduler, triggerer and webserver pods are forever stuck in wait-for-airflow-migrations to finish. However, a db-migrations job is never actually started.
I'm using a customer docker image to include my package requirements:
# Use the official Apache Airflow image from Docker Hub
FROM apache/airflow:2.8.3
USER root
# Set environment variables
ENV AIRFLOW_HOME=/opt/airflow
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libsasl2-dev \
gcc \
build-essential \
unzip \
python3-dev \
default-libmysqlclient-dev \
libpq-dev \
jq \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
USER airflow
COPY requirements/requirements.txt .
ENV PIP_ENV_VERSION=24.0
RUN python -m pip install --no-cache-dir --upgrade pip==${PIP_ENV_VERSION}
COPY requirements/requirements.txt /tmp/tmp-pip/
RUN python -m pip install --no-cache-dir -r /tmp/tmp-pip/requirements.txt
RUN pip list
ENTRYPOINT ["bash", "-c", "airflow db init && airflow webserver & airflow scheduler"]
Relevant Logs
pod/airflow-postgresql-0 1/1 Running 0 2m14s
pod/airflow-scheduler-79f69f58cf-75g6j 0/3 Init:0/2 0 2m14s
pod/airflow-statsd-7c56d8b68-5qntz 1/1 Running 0 2m14s
pod/airflow-triggerer-0 0/3 Init:0/2 1 (119s ago) 2m14s
pod/airflow-webserver-6cdb66595f-pr7xc 0/1 Init:0/1 0 2m14s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m26s default-scheduler Successfully assigned airflow-dags/airflow-webserver-6cdb66595f-pr7xc to ip-10-30-196-195.eu-west-1.compute.internal
Normal Pulling 2m26s kubelet Pulling image "eu.gcr.io/medicuja/airflow-dags:ci_cd_helm_dependency-7"
Normal Pulled 116s kubelet Successfully pulled image "eu.gcr.io/my-project/airflow-dags:my-image-tag" in 30.190834116s (30.190864716s including waiting)
Normal Created 115s kubelet Created container wait-for-airflow-migrations
Normal Started 115s kubelet Started container wait-for-airflow-migrations
Custom Helm Values
images:
airflow:
repository: eu.gcr.io/my-project/airflow-dags
tag: latest
airflow:
dbMigrations:
enabled: True
runAsJob: True
dags:
gitSync:
branch: main
enabled: true
repo: 'git@github.com:my-repo/airflow-dags.git'
subPath: 'dags'
rev: HEAD
sshKeySecret: airflow-git-key
persistence:
accessMode: ReadWriteOnce
annotations: {}
enabled: false
existingClaim: null
size: 1Gi
storageClassName: null
subPath: null
config:
webserver:
expose_config: 'True'
executor: KubernetesExecutor
extraEnv: |
- name: "AIRFLOW__CORE__PLUGINS_FOLDER"
value: "/opt/airflow/dags/repo/plugins"
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: "True"
- name: PYTHONPATH
value: "/opt/airflow/dags/repo"
registry:
secretName: my-project-gcr-secret-basic-service
I'm not using a --wait
flag and I'm not deploying using ArgoCD. Here's my deploy statement:
helm upgrade airflow apache-airflow/airflow --namespace airflow-dags -f helm/values.yaml --set 'images.airflow.tag=image-tag' --atomic --install --timeout=5m
espenthaem commented
I've also discovered I can force the migration job to run by disabling the Helm hooks on the migrationDataBaseJob:
migrateDatabaseJob:
useHelmHooks: false
enabled: true
The migrations jobs seems to complete the init and migration itself, but never shut downs
WARNING:root:OSError while attempting to symlink the latest log directory
DB: postgresql://postgres:***@airflow-postgresql.airflow-dags:5432/postgres?sslmode=disable
/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/db_command.py:47 DeprecationWarning: `db init` is deprecated. Use `db migrate` instead to migrate the db and/or airflow connections create-default-connections to create the default connections
[2024-06-07T10:38:47.561+0000] {migration.py:216} INFO - Context impl PostgresqlImpl.
[2024-06-07T10:38:47.571+0000] {migration.py:219} INFO - Will assume transactional DDL.
[2024-06-07T10:38:48.137+0000] {migration.py:216} INFO - Context impl PostgresqlImpl.
[2024-06-07T10:38:48.137+0000] {migration.py:219} INFO - Will assume transactional DDL.
[2024-06-07T10:38:48.166+0000] {db.py:1623} INFO - Creating tables
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
[2024-06-07T10:38:49.226+0000] {task_context_logger.py:63} INFO - Task context logging is enabled
[2024-06-07T10:38:49.227+0000] {executor_loader.py:115} INFO - Loaded executor: KubernetesExecutor
/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py:165 FutureWarning: The config section [kubernetes] has been renamed to [kubernetes_executor]. Please update your `conf.get*` call to use the new name
[2024-06-07T10:38:49.433+0000] {scheduler_job_runner.py:808} INFO - Starting the scheduler
[2024-06-07T10:38:49.434+0000] {scheduler_job_runner.py:815} INFO - Processing each file at most -1 times
[2024-06-07T10:38:49.436+0000] {kubernetes_executor.py:318} INFO - Start Kubernetes executor
[2024-06-07T10:38:49.514+0000] {kubernetes_executor_utils.py:157} INFO - Event: and now my watch begins starting at resource_version: 0
[2024-06-07T10:38:49.520+0000] {kubernetes_executor.py:239} INFO - Found 0 queued task instances
[2024-06-07T10:38:49.535+0000] {manager.py:169} INFO - Launched DagFileProcessorManager with pid: 37
[2024-06-07T10:38:49.548+0000] {scheduler_job_runner.py:1608} INFO - Adopting or resetting orphaned tasks for active dag runs
[2024-06-07T10:38:49.586+0000] {settings.py:60} INFO - Configured default timezone UTC
[2024-06-07T10:38:49.682+0000] {settings.py:541} INFO - Loaded airflow_local_settings from /opt/airflow/config/airflow_local_settings.py .
[2024-06-07T10:38:49.713+0000] {scheduler_job_runner.py:1631} INFO - Marked 3 SchedulerJob instances as failed
Initialization done
[2024-06-07T10:39:07.533+0000] {configuration.py:2066} INFO - Creating new FAB webserver config file in: /opt/airflow/webserver_config.py
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
Access Logformat:
espenthaem commented
I've just realized I'm actually not using the community edition of Airflow helm chart. My bad. I'll close this.