/airflow-studies

Repo to publish all my airflow studies and experiments.

Primary LanguagePythonMIT LicenseMIT

airflow-studies

Repo to publish all my airflow studies and experiments.

Observation

There is a dependency problem with the constraints used by Airflow regarding lazy-object-proxy==1.9.0, so I created a constraints.txt and removed the constraint related to this dependency. The removal is automated with bash script, so you dont't have to worry about it.

Quickstart

To put an Airflow server up using a Python Virtual Environment and get started with your studies, just run:

chmod +x ./quickstart/start.sh && ./quickstart/start.sh

Using Docker

To start Airflow using Docker containers, use:

docker-compose up --build

Permissions

chmod -R 777 {$pwd}/docker/dags

Parameters

Change in the airflow.cfg these parameters for convenient update of DAGs:

# Number of seconds after which a DAG file is parsed. The DAG file is parsed every
# ``min_file_process_interval`` number of seconds. Updates to DAGs are reflected after
# this interval. Keeping this number low will increase CPU usage.
min_file_process_interval = 5

# How often (in seconds) to check for stale DAGs (DAGs which are no longer present in
# the expected files) which should be deactivated, as well as datasets that are no longer
# referenced and should be marked as orphaned.
parsing_cleanup_interval = 60

# How often (in seconds) to scan the DAGs directory for new files. Default to 5 minutes.
dag_dir_list_interval = 5

Restarting the server

docker-compose up -d --no-deps --build airflow-webserver airflow-scheduler

https://stackoverflow.com/questions/46059161/airflow-how-to-pass-xcom-variable-into-python-function https://stackoverflow.com/questions/51734923/airflow-python-operator-writes-files-to-different-locations