Multi-Property Support cannot be used( Looking forward to answering)
echolun opened this issue · 19 comments
Hi, I have a problem about Multi-Property Support. As a novice dbt user, I found that it could not take effect when using GA4 Multi-Property Support. I have consulted the documents and issues in this warehouse, but I could not get an answer.
I want to make statistics on product registration, but the GA data of web side and app side are scattered in two dataset, the GA dataset of web side is analytics_7866
, and the dataset of web side is analytics_9513
. Here is my profiles configuration:
jaffle_shop:
target: dev
outputs:
dev:
type: bigquery
method: service-account
project: notta-data-analytics
dataset: analytics_9513
threads: 4
keyfile: test-data-analytics-bef750568504.json
and this is my dbt_project configuration:
name: "jaffle_shop"
config-version: 2
version: "0.1"
profile: "jaffle_shop"
model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
analysis-paths: ["analysis"]
macro-paths: ["macros"]
target-path: "target"
clean-targets:
- "target"
- "dbt_modules"
- "logs"
require-dbt-version: [">=1.0.0", "<2.0.0"]
models:
jaffle_shop:
materialized: table
staging:
materialized: view
ga4:
staging:
recommended_events:
stg_ga4__event_sign_up:
+enabled: true
base:
base_ga4__events:
+full_refresh: false
vars:
ga4:
project: "notta-data-analytics"
property_ids: [9513, 7866]
dataset: "analytics_9513"
start_date: "20221010"
static_incremental_days: 2
I'm sure the dbt core, dbt biqquery, and GA4 packages are installed, and I have access to both datasets, but when I run dbt, there are no errors in the log. I found that models with GA4 presets were created under analytics_9513
, and when I went to base_ga4__events
under analytics_9513
to look up the analytics_7866
specific event name, I found that I could not find any records, and at this point base_ga4__events
only contained all the events of analytics_9513
.
My understanding of Multi-Property Support is as follows:
I am not sure if my understanding is wrong. Now I am looking forward to Multi-Property Support to help me aggregate the data of multiple data sets into a base_ga4__events
, which will be conducive to my overall product registration data statistics.
But unfortunately, I can't do that now. I am looking forward to receiving the answer and looking forward to receiving the reply. Thanks.
Also, this is the log when I execute dbt run
, and you can see that there really are no errors:
dbt run
11:44:20 Running with dbt=1.6.6
11:44:21 Registered adapter: bigquery=1.6.7
11:44:21 Unable to do partial parsing because a project config has changed
11:44:23 Found 32 models, 1 seed, 21 tests, 2 sources, 0 exposures, 0 metrics, 524 macros, 0 groups, 0 semantic models
11:44:23
11:44:24 Concurrency: 4 threads (target='dev')
11:44:24
11:44:24 1 of 32 START sql incremental model analytics_9513.base_ga4__events ............ [RUN]
11:44:32 1 of 32 OK created sql incremental model analytics_9513.base_ga4__events ....... [MERGE (0.0 rows, 0 processed) in 8.02s]
11:44:32 2 of 32 START sql view model analytics_9513.stg_ga4__events .................... [RUN]
11:44:33 2 of 32 OK created sql view model analytics_9513.stg_ga4__events ............... [CREATE VIEW (0 processed) in 1.13s]
11:44:33 3 of 32 START sql incremental model analytics_9513.fct_ga4__sessions_daily ..... [RUN]
11:44:33 4 of 32 START sql table model analytics_9513.stg_ga4__client_key_first_last_events [RUN]
11:44:33 5 of 32 START sql view model analytics_9513.stg_ga4__event_click ............... [RUN]
11:44:33 6 of 32 START sql view model analytics_9513.stg_ga4__event_file_download ....... [RUN]
11:44:34 5 of 32 OK created sql view model analytics_9513.stg_ga4__event_click .......... [CREATE VIEW (0 processed) in 1.26s]
11:44:34 7 of 32 START sql view model analytics_9513.stg_ga4__event_first_visit ......... [RUN]
11:44:34 6 of 32 OK created sql view model analytics_9513.stg_ga4__event_file_download .. [CREATE VIEW (0 processed) in 1.46s]
11:44:34 8 of 32 START sql view model analytics_9513.stg_ga4__event_items ............... [RUN]
11:44:35 7 of 32 OK created sql view model analytics_9513.stg_ga4__event_first_visit .... [CREATE VIEW (0 processed) in 1.18s]
11:44:35 9 of 32 START sql view model analytics_9513.stg_ga4__event_page_view ........... [RUN]
11:44:36 8 of 32 OK created sql view model analytics_9513.stg_ga4__event_items .......... [CREATE VIEW (0 processed) in 1.14s]
11:44:36 10 of 32 START sql view model analytics_9513.stg_ga4__event_scroll ............. [RUN]
11:44:36 9 of 32 OK created sql view model analytics_9513.stg_ga4__event_page_view ...... [CREATE VIEW (0 processed) in 1.11s]
11:44:36 11 of 32 START sql view model analytics_9513.stg_ga4__event_session_start ...... [RUN]
11:44:37 10 of 32 OK created sql view model analytics_9513.stg_ga4__event_scroll ........ [CREATE VIEW (0 processed) in 1.19s]
11:44:37 12 of 32 START sql view model analytics_9513.stg_ga4__event_sign_up ............ [RUN]
11:44:37 3 of 32 OK created sql incremental model analytics_9513.fct_ga4__sessions_daily [MERGE (0.0 rows, 0 processed) in 3.86s]
11:44:37 13 of 32 START sql view model analytics_9513.stg_ga4__event_to_query_string_params [RUN]
11:44:37 4 of 32 OK created sql table model analytics_9513.stg_ga4__client_key_first_last_events [CREATE TABLE (27.5k rows, 18.8 MiB processed) in 4.54s]
11:44:37 14 of 32 START sql view model analytics_9513.stg_ga4__event_user_engagement .... [RUN]
11:44:38 11 of 32 OK created sql view model analytics_9513.stg_ga4__event_session_start . [CREATE VIEW (0 processed) in 1.18s]
11:44:38 15 of 32 START sql view model analytics_9513.stg_ga4__event_video_complete ..... [RUN]
11:44:38 13 of 32 OK created sql view model analytics_9513.stg_ga4__event_to_query_string_params [CREATE VIEW (0 processed) in 1.15s]
11:44:38 16 of 32 START sql view model analytics_9513.stg_ga4__event_video_start ........ [RUN]
11:44:38 12 of 32 OK created sql view model analytics_9513.stg_ga4__event_sign_up ....... [CREATE VIEW (0 processed) in 1.22s]
11:44:38 17 of 32 START sql view model analytics_9513.stg_ga4__event_view_search_results [RUN]
11:44:39 14 of 32 OK created sql view model analytics_9513.stg_ga4__event_user_engagement [CREATE VIEW (0 processed) in 1.13s]
11:44:39 18 of 32 START sql view model analytics_9513.stg_ga4__page_engaged_time ........ [RUN]
11:44:39 15 of 32 OK created sql view model analytics_9513.stg_ga4__event_video_complete [CREATE VIEW (0 processed) in 1.14s]
11:44:39 19 of 32 START sql view model analytics_9513.stg_ga4__sessions_first_last_pageviews [RUN]
11:44:39 16 of 32 OK created sql view model analytics_9513.stg_ga4__event_video_start ... [CREATE VIEW (0 processed) in 1.16s]
11:44:39 20 of 32 START sql view model analytics_9513.stg_ga4__sessions_traffic_sources . [RUN]
11:44:39 17 of 32 OK created sql view model analytics_9513.stg_ga4__event_view_search_results [CREATE VIEW (0 processed) in 1.20s]
11:44:39 21 of 32 START sql incremental model analytics_9513.stg_ga4__sessions_traffic_sources_daily [RUN]
11:44:40 18 of 32 OK created sql view model analytics_9513.stg_ga4__page_engaged_time ... [CREATE VIEW (0 processed) in 1.14s]
11:44:40 22 of 32 START sql view model analytics_9513.stg_ga4__user_id_mapping .......... [RUN]
11:44:40 19 of 32 OK created sql view model analytics_9513.stg_ga4__sessions_first_last_pageviews [CREATE VIEW (0 processed) in 1.04s]
11:44:40 23 of 32 START sql view model analytics_9513.stg_ga4_app_sign_app .............. [RUN]
11:44:40 20 of 32 OK created sql view model analytics_9513.stg_ga4__sessions_traffic_sources [CREATE VIEW (0 processed) in 1.11s]
11:44:40 24 of 32 START sql table model analytics_9513.stg_ga4__client_key_first_last_pageviews [RUN]
11:44:41 22 of 32 OK created sql view model analytics_9513.stg_ga4__user_id_mapping ..... [CREATE VIEW (0 processed) in 1.13s]
11:44:41 25 of 32 START sql table model analytics_9513.fct_ga4__sessions ................ [RUN]
11:44:42 23 of 32 OK created sql view model analytics_9513.stg_ga4_app_sign_app ......... [CREATE VIEW (0 processed) in 1.15s]
11:44:42 26 of 32 START sql incremental model analytics_9513.fct_ga4__pages ............. [RUN]
11:44:43 21 of 32 OK created sql incremental model analytics_9513.stg_ga4__sessions_traffic_sources_daily [MERGE (0.0 rows, 29.9 KiB processed) in 3.83s]
11:44:43 27 of 32 START sql table model analytics_9513.dim_ga4__sessions ................ [RUN]
11:44:44 25 of 32 OK created sql table model analytics_9513.fct_ga4__sessions ........... [CREATE TABLE (29.2k rows, 3.0 MiB processed) in 3.31s]
11:44:44 28 of 32 START sql incremental model analytics_9513.stg_ga4__sessions_traffic_sources_last_non_direct_daily [RUN]
11:44:45 24 of 32 OK created sql table model analytics_9513.stg_ga4__client_key_first_last_pageviews [CREATE TABLE (7.8k rows, 16.2 MiB processed) in 4.35s]
11:44:45 29 of 32 START sql table model analytics_9513.fct_ga4__client_keys ............. [RUN]
11:44:46 26 of 32 OK created sql incremental model analytics_9513.fct_ga4__pages ........ [MERGE (0.0 rows, 8.0 MiB processed) in 4.20s]
11:44:46 30 of 32 START sql table model analytics_9513.dim_ga4__client_keys ............. [RUN]
11:44:48 27 of 32 OK created sql table model analytics_9513.dim_ga4__sessions ........... [CREATE TABLE (23.2k rows, 10.6 MiB processed) in 4.59s]
11:44:48 28 of 32 OK created sql incremental model analytics_9513.stg_ga4__sessions_traffic_sources_last_non_direct_daily [MERGE (0.0 rows, 3.4 MiB processed) in 3.39s]
11:44:48 31 of 32 START sql incremental model analytics_9513.dim_ga4__sessions_daily .... [RUN]
11:44:48 29 of 32 OK created sql table model analytics_9513.fct_ga4__client_keys ........ [CREATE TABLE (27.5k rows, 2.8 MiB processed) in 3.64s]
11:44:48 32 of 32 START sql table model analytics_9513.fct_ga4__user_ids ................ [RUN]
11:44:50 30 of 32 OK created sql table model analytics_9513.dim_ga4__client_keys ........ [CREATE TABLE (27.5k rows, 14.0 MiB processed) in 3.69s]
11:44:51 31 of 32 OK created sql incremental model analytics_9513.dim_ga4__sessions_daily [MERGE (0.0 rows, 0 processed) in 3.76s]
11:44:52 32 of 32 OK created sql table model analytics_9513.fct_ga4__user_ids ........... [CREATE TABLE (27.5k rows, 3.4 MiB processed) in 3.89s]
11:44:52
11:44:52 Finished running 6 incremental models, 19 view models, 7 table models in 0 hours 0 minutes and 29.50 seconds (29.50s).
11:44:52
11:44:52 Completed successfully
11:44:52
11:44:52 Done. PASS=32 WARN=0 ERROR=0 SKIP=0 TOTAL=32
Thanks for the thorough issue. I'm not certain what is going on, but I have a few recommendations:
- In your dev profile, you have
dataset: analytics_9513
but typically users will create a dedicated dataset for their own development purposes. This would be something likedev_dbt_echolun
or similar. I wouldn't recommend writing to the source dataset - Similarly, in your
dbt_project.yml
file you havedataset: "analytics_9513"
but the intention here when using multiple GA4 source datasets is to add a new target dataset likeanalytics_combined
. This is where the source data will be copied to. Then yourdev_dbt_echolun
schema will read from this dataset.
I'm not 100% those changes will solve this, but it's worth trying first.
Thank you very much for your reply. First of all, I understand that the location where GA4 creates the default model should not be the same data set as the source data set, so I will create another data set separately, such as dev_ga4_echolun
, and I will modify the profile
configuration to:
jaffle_shop:
target: dev
outputs:
dev:
type: bigquery
method: service-account
project: notta-data-analytics
dataset: dev_ga4_echolun
threads: 4
keyfile: test-data-analytics-bef750568504.json
On your second point, should I create another dataset for Multi-Property Support to hold the merged data? such as analytics_combined
? If so, I will modify my dbt_project
configuration to:
vars:
ga4:
project: "notta-data-analytics"
property_ids: [9513, 7866]
dataset: "analytics_combined"
start_date: "20221010"
static_incremental_days: 2
I actually have another question. I found that both profile
and dbt_project
have a dataset in their GA4 configuration. So far, I can't tell the difference between them very well. I found that the profile dataset determines where GA4's default model will be created, but I didn't understand the role of the dataset under the GA4 configuration, and suppose I created the above two datasets, I still can't understand how dev_ga4_echolun and analytics_combined will work together. Please understand that I may have too many questions. (I tried to use GPT to solve the problems I raised, but all failed)
Of course, I will try it first based on your suggestion. If you have time to read it, I look forward to your reply.
On your second point, should I create another dataset for Multi-Property Support to hold the merged data? such as analytics_combined?
This is correct.
I actually have another question. I found that both profile and dbt_project have a dataset in their GA4 configuration.
The dbt_project.yml
file contains project configuration while the profiles.yml
file contains warehouse connection information.
The dataset in the dbt_project.yml
file in a single-property installation is the dataset into which GA4 is exporting your data. In a multi-property installation, it is the folder that we clone all of your sites into. In the base_ga4__events
model, we have a pre-hook configured which calls a macro that clones all of the source properties into a single folder which lets us interface as all sites without knowing how many sites exist.
Clone operations are free but they take a lot of time.
Your problem could be because you may be trying to process more the one-year's worth of data at a time start_date: "20221010"
. I would expect a timeout error rather then no error, so I think Adam's suggestion will fix the problem that you are looking at, but you'll timeout when you fix that error.
You may need to do a manual clone in BigQuery to get 1-year of data from two sites.
I think the clone takes about 5 minutes per clone so you may be able to clone by increasing your job_execution_timeout_seconds
setting in your profile.yml
file but that's 5 minutes x 60 seconds per minute x 390 days(approx) x 2 sites which should give a timeout of 234,000 seconds or 65 hours.
That's why we recommend setting your base_ga4__events
model to --full-refresh: false
in multi-property installations.
I think you'll need to manually clone. If you do end up going that route, please report here the steps that you took. It would be nice to document the commands needed for that so that it's easier for others to follow.
Hi, I took your advice and now I have created two new data sets under bigquery, one named dev_ga4_test
for profiles and one named analytics_combined
for dbt_project.
To account for the time-out problem you mentioned, I changed the start date of the data I need to aggregate and added the job_execution_timeout_seconds configuration.
Now I modify my profiles
to:
jaffle_shop:
target: dev
outputs:
dev:
type: bigquery
method: service-account
project: notta-data-analytics
dataset: dev_ga4_test
threads: 4
keyfile: test-data-analytics-bef750568504.json
job_execution_timeout_seconds: 3600
And modify my dbt_project
configuration to:
models:
jaffle_shop:
materialized: table
staging:
materialized: view
ga4:
staging:
recommended_events:
stg_ga4__event_sign_up:
+enabled: true
base:
base_ga4__events:
+full_refresh: false
vars:
ga4:
project: "notta-data-analytics"
property_ids: [9513, 7866]
dataset: "analytics_combined"
start_date: "20231102"
static_incremental_days: 2
After I executed dbt seed
, I can see that there is an extra table of type mappings under dev_ga4_test
.
Now when I execute dbt run
, I get the following error:
dbt run
04:22:52 Running with dbt=1.6.6
04:22:53 Registered adapter: bigquery=1.6.7
04:22:53 Found 32 models, 1 seed, 21 tests, 2 sources, 0 exposures, 0 metrics, 524 macros, 0 groups, 0 semantic models
04:22:53
04:22:54 Concurrency: 4 threads (target='dev')
04:22:54
04:22:54 1 of 32 START sql incremental model dev_ga4_test.base_ga4__events .............. [RUN]
04:23:00 BigQuery adapter: https://console.cloud.google.com/bigquery?project=notta-data-analytics&j=bq:US:3a8-7f73-4bf8-9301-3b3c480&page=queryresults
04:23:00 1 of 32 ERROR creating sql incremental model dev_ga4_test.base_ga4__events ..... [ERROR in 5.94s]
04:23:00 2 of 32 SKIP relation dev_ga4_test.stg_ga4__events ............................. [SKIP]
04:23:00 3 of 32 SKIP relation dev_ga4_test.fct_ga4__sessions_daily ..................... [SKIP]
04:23:00 4 of 32 SKIP relation dev_ga4_test.stg_ga4__client_key_first_last_events ....... [SKIP]
04:23:00 5 of 32 SKIP relation dev_ga4_test.stg_ga4__event_click ........................ [SKIP]
04:23:00 7 of 32 SKIP relation dev_ga4_test.stg_ga4__event_first_visit .................. [SKIP]
04:23:00 6 of 32 SKIP relation dev_ga4_test.stg_ga4__event_file_download ................ [SKIP]
04:23:00 8 of 32 SKIP relation dev_ga4_test.stg_ga4__event_items ........................ [SKIP]
04:23:00 9 of 32 SKIP relation dev_ga4_test.stg_ga4__event_page_view .................... [SKIP]
04:23:00 10 of 32 SKIP relation dev_ga4_test.stg_ga4__event_scroll ...................... [SKIP]
04:23:00 11 of 32 SKIP relation dev_ga4_test.stg_ga4__event_session_start ............... [SKIP]
04:23:00 12 of 32 SKIP relation dev_ga4_test.stg_ga4__event_sign_up ..................... [SKIP]
04:23:00 13 of 32 SKIP relation dev_ga4_test.stg_ga4__event_to_query_string_params ...... [SKIP]
04:23:00 14 of 32 SKIP relation dev_ga4_test.stg_ga4__event_user_engagement ............. [SKIP]
04:23:00 15 of 32 SKIP relation dev_ga4_test.stg_ga4__event_video_complete .............. [SKIP]
04:23:00 16 of 32 SKIP relation dev_ga4_test.stg_ga4__event_video_start ................. [SKIP]
04:23:00 17 of 32 SKIP relation dev_ga4_test.stg_ga4__event_view_search_results ......... [SKIP]
04:23:00 18 of 32 SKIP relation dev_ga4_test.stg_ga4__page_engaged_time ................. [SKIP]
04:23:00 19 of 32 SKIP relation dev_ga4_test.stg_ga4__sessions_first_last_pageviews ..... [SKIP]
04:23:00 20 of 32 SKIP relation dev_ga4_test.stg_ga4__sessions_traffic_sources .......... [SKIP]
04:23:00 21 of 32 SKIP relation dev_ga4_test.stg_ga4__sessions_traffic_sources_daily .... [SKIP]
04:23:00 22 of 32 SKIP relation dev_ga4_test.stg_ga4__user_id_mapping ................... [SKIP]
04:23:00 23 of 32 SKIP relation dev_ga4_test.stg_ga4_app_sign_app ....................... [SKIP]
04:23:00 24 of 32 SKIP relation dev_ga4_test.fct_ga4__sessions .......................... [SKIP]
04:23:00 25 of 32 SKIP relation dev_ga4_test.stg_ga4__client_key_first_last_pageviews ... [SKIP]
04:23:00 26 of 32 SKIP relation dev_ga4_test.fct_ga4__pages ............................. [SKIP]
04:23:00 27 of 32 SKIP relation dev_ga4_test.dim_ga4__sessions .......................... [SKIP]
04:23:00 28 of 32 SKIP relation dev_ga4_test.stg_ga4__sessions_traffic_sources_last_non_direct_daily [SKIP]
04:23:00 29 of 32 SKIP relation dev_ga4_test.fct_ga4__client_keys ....................... [SKIP]
04:23:00 30 of 32 SKIP relation dev_ga4_test.dim_ga4__client_keys ....................... [SKIP]
04:23:00 31 of 32 SKIP relation dev_ga4_test.dim_ga4__sessions_daily .................... [SKIP]
04:23:00 32 of 32 SKIP relation dev_ga4_test.fct_ga4__user_ids .......................... [SKIP]
04:23:00
04:23:00 Finished running 6 incremental models, 19 view models, 7 table models in 0 hours 0 minutes and 7.02 seconds (7.02s).
04:23:00
04:23:00 Completed with 1 error and 0 warnings:
04:23:00
04:23:00 Database Error in model base_ga4__events (models/staging/base/base_ga4__events.sql)
notta-data-analytics:analytics_combined.events_* does not match any table.
compiled Code at target/run/ga4/models/staging/base/base_ga4__events.sql
04:23:00
04:23:00 Done. PASS=0 WARN=0 ERROR=1 SKIP=31 TOTAL=32
The error shows notta-data-analytics:analytics_combined.events_* does not match any table.
analytics_combined
is my newly created data set, and there are really no tables under it. Won't GA4 help me aggregate the data of analytics_7866
and analytics_9513
and copy it to analytics_combined
? Or should I have manually merged and copied data from analytics_7866
and analytics_9513
to analytics_combined
in the first place? My previous understanding is that GA4 will help me do this aggregation, which is also the main role of Multi-Property Support
.
Looking forward to your reply,thanks.
In addition, I did a second test, considering the possible timeout caused by too much data, I still used my original configuration, that is, set the data set in the GA4 configuration to my GA source data set, although this is not recommended, and then adjusted the start_date
. The current dbt_project
configuration is:
models:
jaffle_shop:
materialized: table
staging:
materialized: view
ga4:
staging:
recommended_events:
stg_ga4__event_sign_up:
+enabled: true
base:
base_ga4__events:
+full_refresh: false
vars:
ga4:
project: "notta-data-analytics"
property_ids: [9513, 7866]
dataset: "analytics_9513"
start_date: "20231102"
static_incremental_days: 2
and my profiles
configuration is:
jaffle_shop:
target: dev
outputs:
dev:
type: bigquery
method: service-account
project: notta-data-analytics
dataset: analytics_9513
threads: 4
keyfile: notta-data-analytics-bef750568504.json
job_execution_timeout_seconds: 3600
Obviously, I increased the timeout configuration and adjusted the start_date
until November of this year, when I executed dbt run
, I found that the result was the same as before, without any errors. And the base_ga4__events
under analytics_9513
cannot find any buried events about app.
This is the test result information I have obtained so far according to the advice.
Thanks. I believe you would see an error regarding timeouts if this were a timeout issue.
Your source data is not being copied into the analytics_combined
schema which indicates there's an issue running the combine_property_data
macro here: https://github.com/Velir/dbt-ga4/blob/main/macros/combine_property_data.sql
The main component of that macro is this:
CREATE OR REPLACE TABLE `{{var('project')}}.{{var('dataset')}}.events_{{relation_suffix}}{{property_id}}` CLONE `{{var('project')}}.analytics_{{property_id}}.events_{{relation_suffix}}`;
It attempts to create new tables under analytics_combined
by cloning tables from each source dataset (as specified by the properties
variable in dbt).
I'm not sure what's going on the. One thing that is suspicious to me is that your property IDs are so short. Every property ID I've come across is 9 digits but yours are 4. Can you share a screenshot of the datasets and tables you're copying from? Just want to see if I spot anything odd with the formatting of the dataaset and table names.
Of course, in fact, the metadata is in the red box.considering that I am not very familiar with dbt now, I took the last four digits of the original data set id to define two data sets specifically for test learning.
As for time-out, I still haven't seen similar error at present. After I accepted your suggestion to create two data sets, I encountered another error in operation, which I provided in the above answer.
Regarding the id length issue, I am not sure if there is a length limit in the underlying GA4 implementation?
I'm not following why you created 2 new data sets but can you point the dbt-ga4 package to the original analytics_XXXXXXXXX datasets and try again?
Hello, I modified the id of the data set to 9 digits according to your suggestion.this is my dbt project config:
vars:
ga4:
project: "notta-data-analytics"
property_ids: [344009513, 235597866]
dataset: "analytics_combined"
start_date: "20231102"
static_incremental_days: 2
and this is my profiles config:
jaffle_shop:
target: dev
outputs:
dev:
type: bigquery
method: service-account
project: notta-data-analytics
dataset: dev_ga4_test
threads: 4
keyfile: test-data-analytics-bef750568504.json
job_execution_timeout_seconds: 3600
I did a test and found that I encountered the error I asked before. The error message is as follows:
Database Error in model base_ga4__events (models/staging/base/base_ga4__events.sql)
notta-data-analytics:analytics_combined.events_* does not match any table.
compiled Code at target/run/ga4/models/staging/base/base_ga4__events.sql
I don't think the id length is a problem, because the judgment of the data set in combine_property_data
is just a simple concatenation of characters:
{% for property_id in var('property_ids') %}
{%- set schema_name = "analytics_" + property_id|string -%}
And I can be sure that these data sets are real and have access.
I gradually understood the error I raised before. I don't think combine_property_data
is performing as expected.I set the dataset to analytics_9513
in the original question. But analytics_9513
was itself the data source, and it continued to support the subsequent execution of GA4
.So it doesn't make any errors, because I haven't done a data set merge, so I can't find any events outside of this data set.
But now I set the dataset to analytics_combined
, and since this is a new and clean dataset, combine_property_data
execution failed resulting in no tables under analytics_combined
, This also causes all subsequent models of GA4
to fail, resulting in notta-data-analytics:analytics_combined.events_* does not match any table.
I don't know if my guess is correct, but I think combine_property_data
should do the data aggregation first before base_ga4__events
is executed.
I can't explain why combine_property_data
is not executed at the moment, can I execute it manually?
Regarding the error mentioned above, I found that this issue also encountered a similar error, but I am not clear how he solved it.
Looking forward to your reply,thanks.
I've roughly analyzed the logs.This is my verification idea. I add log printing in get_relations_by_pattern, and the code is as follows:
{%- if table_list and table_list['table'] -%}
{%- set tbl_relations = [] -%}
{%- for row in table_list['table'] -%}
{%- set tbl_relation = api.Relation.create(
database=database,
schema=row.table_schema,
identifier=row.table_name,
type=row.table_type
) -%}
{%- do tbl_relations.append(tbl_relation) -%}
{%- endfor -%}
{{- log("Table relations: " ~ tbl_relations) -}}
{{ return(tbl_relations) }}
{%- else -%}
{{ return([]) }}
{%- endif -%}
I try log tbl_relations at the end of the loop, which is the first step, and then this is my current dbt_project configuration:
vars:
ga4:
project: "demo-data-analytics"
dataset: "analytics_combined"
property_ids: [355009513, 244597866]
static_incremental_days: 2
start_date: "20231112"
frequency: "daily"
Notice that my start_date
is set to 20231112
, and it's only 20231114
today, after which I run dbt run --debug
, I will take a screenshot of some of the logs that I think are important:
In the figure above, you can see the data collection start time I set and the log tbl_relations I added, which is a very huge data that basically contains several years of data. Moreover, another data set of the whole year is also output in the log below. I am confused. When my start_date is set to 20231110, why does it still find all the data in both datasets?
Then the log comes to combine_property_data
and we can see that we are now going to create a new table and merge it, and the merge here really only handles events_20231112
and events_20231113
, which can't be a huge amount of data. But the creation here failed because at the end of the execution, I couldn't find any new tables under analytics_combined
.
Finally, GA4 would create GA4 default tables such as base_ga4__events
based on the merged data set, but the natural execution failed because the merge failed and analytics_combined
was an empty data set.
This is my log analysis of the entire dbt run. Because I don't see any other useful errors in the entire log, I just know the data set and fail, even though it's not a very large amount of data.
I really need your help. At the moment, I have no way of knowing why multiple data sets fail.I hope these log information can help you analyze the problem. If you need more information, let me provide it.
I'm really sorry this isn't working for you, but it's hard to pinpoint the issue. All of the code handling the partition copy into the "combined" dataset is contained here: https://github.com/Velir/dbt-ga4/blob/main/macros/combine_property_data.sql
It's only 38 lines of code, so I recommend trying to run each step manually and determining which step is failing.
Hi, after reading the combine_property_data.sql
merge process, I found that GA4 only does the simple table merge, I always thought that GA4 would also do part of the pre-processing in the merge process, which is why I have always wanted to do the merge through GA4. But now I have decided to merge the tables manually (iteration time is tight), because even if I solve the merge problem, I may still be unable to circumvent the time-out problem caused by too much data. I have no trouble with this problem, thank you very much for your answer.
But now I have another problem. I find that the target dataset written by the GA4 model is the schema attribute of dbt_project combined with the dataset of profiles. For example:
This is my profiles:
bigquery_profile:
target: prod
outputs:
dev:
type: bigquery
method: service-account
project: notta-data-analytics
dataset: dev
....
prod:
type: bigquery
method: service-account
project: notta-data-analytics
dataset: dbt_models
....
This is how my dbt is configured for different directories
ga4:
staging:
+schema: "{{ 'mc_data_statistics' if target.name == 'dev' else 'ga4' }}"
base:
base_ga4__events:
+full_refresh: false
recommended_events:
stg_ga4__event_sign_up:
+enabled: true
When I run dbt run --target prod
, the model is written to the dbt_models
combined with ga4
which is the dbt_models_ga4
dataset.
When I run dbt run --target dev
, the target book to which the model is written will be dev_mc_data_statistics
.
But I found that the two files stg_ga4__sessions_traffic_sources
and stg_ga4__sessions_traffic_sources_daily
in the GA4 package rely on ga4_source_categories
.However, the target dataset it seeks will not be concatenated as above, but directly to the dev
or dbt_models
set by the DATASET to look for, It does not dynamically concatenate schemas,resulting in no file found and an error.
Now I have to modify the GA4 source code to manually specify the ga4_source_categories
position, I am not sure if there is a better way to solve this, at present I can only import GA4 as a local package and modify the source code, I think this is not a good practice. This will also cause me to have some trouble in upgrading GA4 in the future.
Before modification:
After modification:
ga4_source_categories
is a seed file, so I'm pretty sure you just need to set the target schema in the dbt_project.yml just like you did for the staging models.
you have:
ga4:
staging:
+schema: "{{ 'mc_data_statistics' if target.name == 'dev' else 'ga4' }}"
But the path to the seed file is:
ga4:
seeds:
(I think :) )
Oh my God, you're right, I was so focused on the models that I neglected that seed should also be configured to determine dev or prod. I tried it though, and the correct configuration is that GA4 should be under seed, which is my current configuration, and it really solved my problem, thank you very much!!!!
I may have one last question about GA4 😂, at the moment all I really need is the GA event for login registration, but I find that GA4 has a lot of models preset that these events will be executed every time I run dbt run
.
Since my data volume is very large, and bigquery is billed by query volume, I want some events that are executed by default in GA4 to no longer be executed, and I really don't need them. Can I disable events through GA4 configuration? I don't see anything like that in the GA4 documentation.
The default materialization is view
which means that creating those event models (ex: stg_ga4__event_click
) and running your dbt job won't cause any billing. You'll only be billed if you query that model.
That said, you can disable any models you like in your dbt_project.yml
https://docs.getdbt.com/reference/resource-configs/enabled
Also see here: https://github.com/Velir/dbt-ga4/tree/main/models/staging/recommended_events
Thanks for the reply, I now know how to disable some models.but what I didn't understand is that default models like stg_ga4__event_click
rely on stg_ga4__events
. And stg_ga4__events
depends on base_ga4__events
, where the model refinement depends on queries step by step, won't there be a cost?
My apologies. I've been away on holidays. Thank you Adam for helping out here.
Those models are Views. Views are basically aliases to a query that would produce that model presented as if it actually existed but the model does not exist.
They are best used for models that are steps in building production models but aren't actually queried except in the build process.
You also see them used for joining production models so that users don't have to perform that join in the data vis tool which can often be clunky.
Those models don't cost anything more, they just give us a place to perform transformations on all event data (stg_ga4__events
), or specific event data (stg_ga4__event_click
). Because they use select *
, they pick up upstream transformations.
For events, it's expected that you will add a fact table, fct_ga4__event_click
on top of those models materialized as a table or incremental table where you select the data that you actually want users to have access to.
Thank you very much for your reply, until now I did not have a very deep understanding of tables and views, now I understand that views are only temporary stores of procedure steps, when I run dbt run --models stg_ga4__event_click
it actually only saves the query statement, but does not query it immediately. I've been misunderstanding this before.
Thanks to Damon and Adam for helping me answer a lot of questions, I have no more questions, I will close this issue.
Have a nice day!