Add labels to all (BigQuery) resources and processes
adam-phillipps opened this issue · 0 comments
Please describe the feature you'd like to see
It would be very happy-making if I could add arbitrary labels to all BigQuery resources, jobs, queries, etc. that natively support them, so that I can run more accurate cost analysis.
Describe the solution you'd like
Please add tags/labels/etc. to the Table
objects or allow us to pass tags and labels through as kwargs
in create and load functions.
We are able to add QueryModifiers
to some processes but I’d like to label all queries and the tables they create.
Are there any alternatives to this feature?
I’m not aware of any way around this. I can’t remember how it all works in AWS and I’m not sure at all about Azure, Databricks, etc. but in GCP, you can’t add labels to things like queries and jobs after they’ve been submitted and the only way I know how to filter during cost analysis is based on arbitrary metadata, ie cost reporting in AWS and GCP uses tags and labels respectively.
Additional context
This issue is similar. Many users have this need.
It looks like loading a table in BigQuery can be labeled in a few different places, here are two:
- load_gs_file_to_table can merge user supplied labels with the current label already being sent in
- BigQueryDatabase.load_pandas_dataframe_to_table can take these 4 extra lines to update labels.
destination_table_ref = bigquery.table.TableReference.from_string(
self.get_table_qualified_name(target_table),
default_project=project_id
)
table = self.hook.get_table(destination_table_ref)
table.labels = labels
self.hook.update_table(table, ["labels”])
Acceptance Criteria
- All checks and tests in the CI should pass
- Unit tests (90% code coverage or more, once available)
- Integration tests (if the feature relates to a new database or external service)
- Example DAG
- Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
- Exception handling in case of errors
- Logging (are we exposing useful information to the user? e.g. source and destination)
- Improve the documentation (README, Sphinx, and any other relevant)
- How to use Guide for the feature (example)