/fhir-dbt-analytics

Data quality analytics for FHIR exported to BigQuery.

Primary LanguageShellApache License 2.0Apache-2.0

Getting started

Getting Started   |   Project overview   |   Extending the project   |   Feedback


What is FHIR dbt analytics?

A dbt project which produces data-quality analytics from FHIR resources stored in BigQuery.

Use the metrics in fhir-dbt-analytics to check the quality of clinical data. The metrics might count the number of FHIR resources to compare to expected counts or check references between FHIR resources, such as between patients and encounters. Some metrics can help you check the distribution of coded values in your data. You can run all the metrics as a suite, selected metrics, or individually.

Many of the metrics also break down results into different dimensions. For example, the encounter_count metric can show counts for different encounter classes (e.g. inpatient, emergency, ambulatory). The project includes the following elements:

  • built-in metrics (parameterized so you can easily extend them) to measure clinical data quality
  • views which aggregate the results ready for your data-visualization tools

You need to run these analytics tools using dbt — an open-source data-transformation tool. If you’re already analyzing FHIR data with dbt, you can take advantage of the macros from this project. The dbt macros can help you build patient cohorts, navigate and extract values from FHIR resources, or inspect BigQuery datasets. The dbt selectors gather metrics into themes so that you can run just the metrics you’re interested in.

What you'll need

Before you can run this project, you’ll need the following:

Install the project

To install the project, run the following commands in your terminal to create a new folder in the current directory:

git clone https://github.com/google/fhir-dbt-analytics
cd fhir-dbt-analytics

Setup dbt outputs

Open profiles.yml and fill in the project and dataset as indicated in the file.

Setup source data

By default, the source data are from the Synthea Generated Synthetic Data in FHIR public dataset. You can test running the project over this dataset by leaving the defaults unchanged.

To analyze your own data, export them to BigQuery from a Google Cloud FHIR store, following Storing healthcare data in BigQuery and point the project variables to it by editing the dbt_project.yml file:

  • database: The name of a Google Cloud project which contains your FHIR BigQuery dataset. For example, bigquery-public-data.
  • schema: The name of your FHIR BigQuery dataset. For example, fhir_synthea.
  • timezone_default: The IANA time-zone name. For example, Europe/London.

Run the project

First time

The first time that you run the project, you need to install dependent packages and seed static data by running the following commands in the project directory:

dbt deps
dbt seed

Analytics

Now you're ready to create the analytics by running the following two commands in your terminal:

dbt run
dbt run --selector post_processing

dbt run runs all the data quality metrics in the project. To save time, you can run a selection of metrics if you include a selector argument from selectors.yml. For example, to run only the Encounter metrics, use dbt run --selector resource_encounter.

dbt run --selector post_processing runs models that consolidate the metric outputs.

After both of these commands have successfully run, you can inspect the tables and views created in the BigQuery dataset that you specified within profiles.yml. Two key tables created are:

  • metric: union of all metric outputs at the most granular level
  • metric_definition metric definitions, one row per metric

A good place to start is querying the metric_by_system view that joins these two tables together and calculates overall metric values. The output of this view is one row per metric.

Once you have confirmed that metrics are being generated, you will find it helpful to read the project overview to further understand the project structure, and then extending the project to learn how to add metrics of your own.

Support

fhir-dbt-analytics is not an officially supported Google product. The project is work-in-progress so expect additional metrics and other content to be added as well as potentially breaking changes as we refine the project structure.

If you believe that something’s not working, please create a GitHub issue.