/bq-metadata-cloud-run

Set up a scheduled Cloud Run job to update HTAN metadata tables in BigQuery daily.

Primary LanguagePython

bq-metadata-cloud-run

Cloud Run job to pull metadata manifests from Synapse and update tables in the Google BigQuery dataset htan-dcc.combined_assays. This dataset contains clinical, biospecimen, and assay metadata tables combined across HTAN centers.

Scheduled to run daily at 0200 ET.

Requirements

Requires access to deploy resources in the HTAN Google Cloud Project, htan-dcc. Please contact an owner of htan-dcc to request access (Owners in 2024: Clarisse Lau, Vesteinn Thorsson, William Longabaugh, ISB)

Prerequisites

  • Create a Synapse Auth Token secret in Secret Manager. Requires download access to all individual HTAN-center Synapse projects. Currently uses synapse-service-HTAN-lambda service account.

  • Install Terraform >= 1.7.0

Docker Image

Before creating job, build and push a docker image to Google Artifact Registry (recommended)

cd src
docker build . -t us-docker.pkg.dev/<gc-project>/gcr.io/<image-name>
docker push us-docker.pkg.dev/<gc-project>/gcr.io/<image-name>

Deploy Cloud Resources

Define variables in terraform.tfvars. Variable descriptions can be found in variables.tf

terraform init
terraform plan
terraform apply