This Airflow DAG demonstrates the capability of synchronizing DAG metadata with Google Cloud Dataplex Catalog. It extracts information about DAGs and their tasks, then updates or creates corresponding entries in the Dataplex Catalog.
This is a demonstration project and is not an official Google Cloud product. The code provided is not production-ready and should be used for educational and experimental purposes only.
The DAG performs the following main operations:
- Retrieves a list of all DAGs in the Airflow environment.
- For each DAG:
- Extracts metadata (ID, description, schedule, etc.)
- Creates or updates an entry in the Dataplex Catalog
- Google Cloud Platform account with Dataplex API enabled
- Airflow environment set up and running
- Necessary permissions to access DAG information and modify Dataplex Catalog
Before running the DAG, ensure the following variables are set in your DAG:
project_id
: Your Google Cloud Project IDlocation
: The location of your Dataplex Catalog (e.g., "us-central1")entry_group_id
: The ID of the Dataplex entry groupentry_type
: The type of entry (e.g., "airflow-dag")aspect_type
: The aspect type for DAG metadata
The DAG consists of the following main components:
export_dags_to_dataplex
: A PythonOperator that processes all DAGscreate_entry
: Function to create or update an entry in Dataplex Catalogexport_dags_to_dataplex
: Function to iterate through all DAGs and create entries
Please set up first Dataplex Entry Type, Entry Group and Aspect Type using create_aspect_type.py. Once the DAG is added to your Airflow environment and the necessary configurations are set, it will run according to the specified schedule (default is daily at 00:00).
- Automatic extraction of DAG metadata including ID, description, schedule, and owner
- Creation and updating of Dataplex Catalog entries for each DAG
- Error handling and logging for better troubleshooting
You can modify the DAG to fit your specific needs:
- Adjust the schedule interval in the DAG definition
- Modify the metadata extraction logic in the
create_entry
function - Customize the Dataplex entry creation in the
create_entry
function
If you encounter issues:
- Check Airflow logs for detailed error messages
- Ensure all required permissions are set correctly and that Composer/Airflow service account has access to Dataplex API Catalog metadata
- Verify that the Dataplex API is enabled and accessible
Contributions to improve this demonstration DAG are welcome. Please submit a pull request with your proposed changes.
This project is licensed under the Apache License, Version 2.0. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.