airflow-helm/charts

Airflow Core Parallelism set to 512 but still the tasks are running at max 32 task at a time.

asif2017 opened this issue · 2 comments

Checks

Chart Version

8.7.1

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-08T10:15:02Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"4009cfb7c076b982a1e38b079ae363ece1eb1a19", GitTreeState:"clean", BuildDate:"2023-06-12T18:46:22Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

Helm Version

version.BuildInfo{Version:"v3.12.3", GitCommit:"3a31588ad33fe3b89af5a2a54ee1d25bfe6eaa5e", GitTreeState:"clean", GoVersion:"go1.20.7"}

Description

I have set airflow core parallelism to 512 but the tasks are running at max 32 at a time. I have default slot of 128. I thought this might be due to less slot and because of this settings might not be working. so, I have tested with parallelism 56 and in that case only 16 tasks are running at maximum.

Second issue I encountered the high usage of memory. I have around 350 dags and if you see only 32 tasks can run parallelly but the memory consumption is reaching almost 20-25 gb. Is this expected or is this could be a memory leakage.
I'm using CeleryExecutor with postgres as a metadata serve. Below are my memory consumption details.
Airflow version 2.5.3

NAME CPU(cores) MEMORY(bytes)
airflow-cluster-db-migrations 2m 227Mi
airflow-cluster-flower 8m 293Mi
airflow-cluster-pgbouncer 344m 23Mi
airflow-cluster-redis-master-0 7m 14Mi
airflow-cluster-scheduler 698m 3324Mi
airflow-cluster-triggerer 176m 841Mi
airflow-cluster-web 118m 1512Mi
airflow-cluster-worker-0 212m 5429Mi
airflow-cluster-worker-1 234m 10059Mi

Relevant Logs

No response

Custom Helm Values

config:
  AIRFLOW_CORE_PARALLELISM=512
  AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG: 128
  AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG: 1

I have default values for all the pods. Only the above values I have added in config for parallelism.

Okay, I got the issue I have set it up the AIRFLOW_CORE_PARALLELISM but I was missing one more setting AIRFLOW__CELERY__WORKER_CONCURRENCY, which is by default 16 per worker. This needs to be set it up for parallelism.

But the memory consumption still too high and I see the both the worker share the fair amount of tasks but worker 1 always using the double the memory of worker 0.
I also feel if my pod is not restarting then in that case memory getting increased day by day till the time pod is restarting. I deployed a new cluster 2 days ago with 40 dags and it trigger pod was not restarted since first deployment and today I saw trigger pod is consuming 975Mi. Now, I have restarted the trigger pod and the memory usage got down to 200 Mi.

This issue has been automatically marked as stale because it has not had activity in 60 days.
It will be closed in 7 days if no further activity occurs.

Thank you for your contributions.


Issues never become stale if any of the following is true:

  1. they are added to a Project
  2. they are added to a Milestone
  3. they have the lifecycle/frozen label