This repo provides an example of how BigQuery GIS and Remote Functions can be used to load GeoJSON files to BigQuery and determine which customers are potentially impacted by hail storms. The following instructions should help you get started.
Before you start: Though using a new GCP project for this example is not a requirement, it might be easiest to use a new GCP project for this. This makes cleanup much easier, as you can delete the whole project to ensure all assets are removed and it ensures no potential conflicts with existing resources. You can also remove resources by running terraform destroy
after you deploy the resources
1. You'll need to set your Google Cloud project in Cloud Shell, clone this repo locally first, and set the working directory to this folder using the following commands.
gcloud config set project <PROJECT ID>
git clone https://github.com/shanecglass/hail_demo.git
cd hail_demo
Check to make sure the Cloud Resource Manager API is enabled
This app uses Cloud Functions, BigQuery, Dataform, and more. Run the following to execute the Terraform script to setup everything.
First, initialize Terraform by running
terraform init
touch ./terraform.tfvars
nano ./terraform.tfvars
Edit the values for the required variables, save the file, and exit.
# This is an example of the terraform.tfvars file.
# The values in this file must match the variable types declared in variables.tf.
# The values in this file override any defaults in variables.tf.
# ID of the project in which you want to deploy the solution
project_id = "PROJECT_ID"
# Google Cloud region where you want to deploy the solution
# Example: us-central1
region = "REGION"
# Whether or not to enable underlying apis in this solution.
# Example: true
enable_apis = true
# Whether or not to protect BigQuery resources from deletion when solution is modified or changed.
# Example: false
force_destroy = false
# Whether or not to protect Cloud Storage resources from deletion when solution is modified or changed.
# Example: false
deletion_protection = false
Run the following:
terraform validate
If the command returns any errors, make the required corrections in the configuration and then run the terraform validate command again. Repeat this step until the command returns Success! The configuration is valid.
Review the resources that are defined in the configuration:
terraform plan
terraform apply
When you're prompted to perform the actions, enter yes
. Terraform displays messages showing the progress of the deployment.
If the deployment can't be completed, Terraform displays the errors that caused the failure. Review the error messages and update the configuration to fix the errors. Then run terraform apply
command again. For help with troubleshooting Terraform errors, see Errors when deploying the solution using Terraform.
After all the resources are created, Terraform displays the following message:
Apply complete!
The Terraform output also lists the following additional information that you'll need:
- A link to the Dataform repository that was created
- The link to open the BigQuery editor for some sample queries
Click the Dataform link in the Terraform output to create a workspace, then click "initialize workspace" to get started.
Update your Dataform Workspace's dataform.json
file as described below:
{
"defaultSchema": "hail_demo",
"assertionSchema": "dataform_assertions",
"warehouse": "bigquery",
"defaultDatabase": "<YOUR PROJECT ID>",
"defaultLocation": "us-central1"
}
Next, add the following files from the definitions folder to your Dataform workspace:
- load_geojson - This calls the remote function created by the Terraform script to load the polygons defined in the GeoJSON file into BigQuery
- convert_customer_geog - This cleans the sample customer table and converts the
customer_geog
column to a GEOGRAPHY type. This might not be necessary if your customer data already has locations stored as GEOGRAPHY columns in BigQuery - customers_impacted - This finds which customer locations fall within the hail polygon, indicating customers who were impacted by the storm
To add the files to your Workspace:
- Remove the sample view files in your Datawork workspace's definitions folder.
- Click the 3-dot menu next to your Dataform Workspace's definitions folder in the UI and select
Create file
- Name the file so it matches the examples provided and be sure to include the
.sqlx
extension. For example, the "Add a file path" field for theload_geojson
file should readdefinitions/load_geojson.sqlx
- Copy and paste the code from the file into the Dataform IDE.
Once the files are added, click Start Execution
at the top of the Dataform IDE and select All Actions
to run the full pipeline. Click the Start Execution
button in the confirmation screen that pops up to run the workflow. You can see the status of your execution by clicking Details
in the notification at the bottom of the screen, or clicking Executions
at the top of the IDE
From here, click on the Looker Studio link in your Terraform outputs to start analyzing the data using a pre-built template. You can also head to the BigQuery console to see the table of the customers who were impacted by the hail event. If you need to see your Terraform outputs again, simply enter terraform output
into your command line.