Super rough outline for 'How to get started with dbt metrics and BigQuery'.
- Extract and Load some data into BigQuery
- (Optional) Setup billing in Google Cloud (could use local Postgres instead) and attach it to the Google Cloud project you are going to use. This is because some of the DML commands I used required billing to be setup. I setup a billing budget also, to try and avoid a surprise (again could setup local Postgres to avoid a surprise bill).
- (Optional) I setup a new project in GCloud:
metricslayer
- Create a new dataset in BigQuery. In my case made a US Region dataset and called it
austin_bikes_metrics
- Create a new table in BigQuery. This is where the extracted data will go. My table schema was the same schema as
bigquery-public-data.austin_bikeshare.bikeshare_trips
as that is where I "extracted" "raw" data from. - Extract the data. I got only a sample of the data, because I wanted things later to run fast.
INSERT INTO
metricslayer.austin_bikes_metrics.bikeshare_ridesSELECT * FROM 'bigquery-public-data.austin_bikeshare.bikeshare_trips' WHERE rand() < 0.01
. Of course change the table you insert INTO to your gcloud project, dataset, and desired table name.
- Setup dbt for local development
- Install dbt for target db. Directions here (except setup for BigQuery, not Postgres):
https://blog.jetbrains.com/big-data-tools/2022/01/25/how-i-started-out-with-dbt/
for 'BQ:https://docs.getdbt.com/docs/available-adapters' - Create service account in gcloud (or reuse service account if one already created.) Directions here: 'https://medium.com/@ivan_toriya/step-by-step-guide-to-run-dbt-in-production-with-google-cloud-platform-fb1f035f3c7b'
- Create or add a JSON key for the service account and download it. Do not place this key anywhere public.
- Setup dbt profile. Run
dbt init
to do this. Use service account and the JSON key you just downloaded
- Install dbt for target db. Directions here (except setup for BigQuery, not Postgres):
- Configure and run dbt.
- Clone or fork this project.
- Change everywhere
metricslayer
to your project name. - Change everywhere
austin_bikes_metrics
to your dataset name. - If you picked a table other than
bikeshare_rides
change that in the source model. - Change in the
dbt_project.yml
fileprofile:
parameter to match the profile from step 2 above.
Try running the following commands:
- dbt deps
- dbt run
- dbt test
- dbt docs generate
- dbt docs serve
- Learn more about dbt in the docs
- Check out Discourse for commonly asked questions and answers
- Join the chat on Slack for live discussions and support
- Find dbt events near you
- Check out the blog for the latest news on dbt's development and best practices