Sync released data models to BQ tables
adamjtaylor opened this issue · 4 comments
As a data manager for HTAN I would like our internal BigQuery `htan-dcc:metadata tables to include
data-model_main
- a accurate reflection of the main branch of the data modeldata-model_vYY.MM.minor
: a table for every version of the data modeldata-model_latest
a table that reflects the latest released version of the data model
This allows us to ensure that we can use this in queries against our submitted manifests or other information held in BigQuery
We can extend the bq-schema workflow as follows
Add running when a release is created
on:
push:
branches: main
paths: 'HTAN.model.csv'
release:
types: [created]
workflow_dispatch:
Add a job to create a versioned table if the event name is release
add-versioned-table:
name: Add versioned schema to BQ
runs-on: ubuntu-latest
needs: add-to-bq
if: github.event_name == 'release'
Then duplicate the versioned table as latest
- name: Duplicate versioned table as latest
shell: bash
run: |
VERSION=${{ github.event.release.tag_name }}
bq cp htan-dcc:metadata.data_model_${VERSION} htan-dcc:metadata.data_model_latest
Please add a "critical" label if expected within phase 1.0. Or a "renewal" label if this can wait.
Need to discuss with ISB during data flow discussions. @aclayton555 tag in flow diagram. Need to understand how users are engaging with BQ
Currently, there is workflow there is workflow to sync the staging version of the data model with the BQ tables. Used to help populate attribute description. This is still currently in use, but @PozhidayevaDarya has not run it for a little while. TBD on updating this to include the attributes listed here AND update this so that it is no longer pointing to staging.
24-8 Close-out: take this into consideration in the data model design doc. Need to understand what is needed here and the needed architecture.