A plugin for Apache Airflow.
##Setup: At the moment, I have not set up binary builds, so adding it with pip will require a "git+< url >" entry. ###Prerequisites:
apache-airflow
click
cryptography
gitpython
# pip install apache-airflow click cryptography gitpython
NOTE: Airflow must be installed and configured prior to plugin install.
###Installation:
NOTE: This will install needed prerequisites
pip install git+https://github.com/DACRepair/airflow-repoman.git
Once this has been completed, you will need to verify your dag_folder
is set to where you want it.
Open up a command line and run airflow-repoman init
. If this does not work, verify that there is a binary in your bin folder.
###Airflow Webserver To access the DAG Repos, simply navigate to Admin -> DAG Repos (this applies to both flask_admin and F.A.B)
Once you are on the list page, press +
or Create
to begin generating a record.
Field | Description |
---|---|
Repo Name | The name of the repo (used to name the folder / for organizational use) |
Repo Enabled | Enable or Disable sync |
Repo URL | The URL to the git repo. |
Repo Branch | The branch you would like to pull your DAG's from. |
Repo Username | The username used to access the git repo. |
Repo Password | The password used to access the git repo (This fields is encrypted in the database much like Connection passwords. |
Refresh Interval | The rate that the app will check for changes (in seconds). |
Note about SSH repos:
Officially, this plugin does not support SSH repos. This does not mean they do not work. You will need to make sure you have password-less login to the git server or the proper keys set up in the OS. You will just need to make sure your "Repo User" is set as well as a proper URL. The password field is not used for this at all. Functionally, this is just passing information to the 'git' command, so you could flex it however you want, but it is not something the maintainer wants to support.
To run the repo sync job, you simply need to run airflow-reposync reposync
. This will run the sync once, so setting it up as a cronjob can be used with this, or if you want to perform a one-shot sync.
If you want to run it continuously (IE like a daemon), you will need to pass the --continuous
flag at the end of the command.
The airflow-reposync job runner will only touch files in your DAG folder that have been created by the reposync. If you manually add other non managed DAG's, this is fine. Enabling / Disabling a repo does not delete the folder in the DAG folder either, nor does it affect the DAG's running capability (it only turns off automatic update). However, if you remove a repo from the web panel, it WILL delete the dag folder it generated.
Note about git:
Since this syncs DAGs via git, you will need to maintain that dag folder as a read-only folder. If you generate flat files such as csvs/blobs/etc, make sure you add an exception to your .gitignore
file.