Use K8s CustomResourceDefinition to replace Airflow Git Sync strategy. The main idea of the project is to start a synchronization service with Quarkus Operator on each airflow pod to synchronize the DAG/files into the DAG folder.
This project has included the docker image packaging part (buildconfig or quarkus build) and the modified helm chart template based on the official airflow project (https://github.com/apache/airflow/tree/main/chart).
- use
kubectl
to create rbac andCustomResourceDefinition
(operator can automatically create resources) - create some dag resource instances
- start quarkus operator
- operator can list dag resources or automatically perceive changes in resources, then create/update or delete dags in DAG folder.
Resource description can be referred to 02-crd.yaml .
There are several important attributes in CRD
, which are described here:
Parameter | Description | Default |
---|---|---|
type |
Type of CRD, it can be dag_file , file or dag_yaml . dag_file must be a DAG description file. file can be a python or other text format file.dag_yaml reference dag-factory, but add some changes |
dag_file |
path |
File path. If the file path is empty, it defaults to the root directory of dags , otherwise it is a subdirectory under dags |
|
file_name |
If type is file , we need a file_name . |
|
dag_name |
If type is dag_file or dag_yaml , we need a dag_name . If dag_name don't have .py suffix, the operator will automatically append it. |
crd name |
content |
If type is dag_file or file , It is the content of the file. |
|
paused |
If paused is not empty, the operator will scan the DAG status and automatically pause / unpause the task. |
|
dag_yaml |
The described of DAG by yaml, For details, please refer to dag-factory |
We can run our application in dev mode that enables live coding using:
./mvnw compile quarkus:dev
An example has been in /example
folder. In /example
, it includes RBAC
,CRD
, some cases and Deployment
for test.
If we use OpenShift, we can use BuildConfig
or Tekton/Pipline
to build a native image.
Otherwise, we can create a native executable using:
./mvnw package -Pnative
# if use macOS, you should use -Dquarkus.native.container-build=true to build quarkus in docker with a linux environment
docker build -f src/main/docker/Dockerfile.native -t quarkus/airflow-dag-operator .
Or, if we don't have GraalVM installed, we can run the native executable build in a container using:
./mvnw package -Pnative -Dquarkus.native.container-build=true
docker build -f src/main/docker/Dockerfile.native -t quarkus/airflow-dag-operator .
Helm dependency update to add postgresql chart and lint
. We need helm3 to build.
# dependency update
helm dep update
# lint
helm lint
# debug
helm install --dry-run --debug -f values.yaml airflow -n airflow .
Deploy Chart
# install
helm install -f values.yaml airflow -n airflow .
# upgrade
helm upgrade -f values.yaml airflow -n airflow .
# uninstall
helm uninstall airflow
We need to rebuild the image
# if want to support pause, we need to build by change `quarkus.datasource.jdbc` from false to true
./mvnw package -Pnative -Dquarkus.datasource.jdbc=true
Due to the complexity of parsing DAG's python codes, we need to ensure that dag_name
and dag_id
are consistent for now.
Note that the helm has not been modified yet right now! by design, the operator will only turn on support pause
on the scheduler node to avoid repeated executions.
- 2021-10-09 1.0.0 First Commit
- 2021-11-16 1.0.1 Update to quarkus-operator-sdk 2.0.0
- 2022-03-27 1.0.1 Update to quarkus-operator-sdk 3.0.5
- 2022-07-15 1.0.2 Update to quarkus-operator-sdk 4.0.0.RC / Support paused