Cannnot make use of Multi-Property Support
Closed this issue ยท 8 comments
I'm having trouble using Multi-Property Support, could you give me some hints on how to fix it? There are multiple GA4 datasets in my BigQuery project, such as analytics_11111111 and analytics_22222222. I followed the instructions in this document [1] and wrote the following in dbt_project.yml.
models:
dev:
...
googleanalytics:
schema: googleanalytics
+tags:
- "googleanalytics"
vars:
ga4:
property_ids: [11111111, 22222222]
static_incremental_days: 3
project: "my-project"
dataset: "ga4"
start_date: "20230601"
frequency: "daily"
I also wrote the following in dbt/models/googleanalytics/daily_sessions.sql.
select * from {{ ref('dim_ga4__sessions_daily') }} LIMIT 1000
When I run dbt run --select tag:googleanalytics, I get the following result.
> dbt run --select tag:googleanalytics
10:53:05 Running with dbt=1.4.6
10:53:06 Unable to do partial parsing because a project config has changed
10:53:07 Found 122 models, 21 tests, 0 snapshots, 0 analyses, 474 macros, 0 operations, 1 seed file, 67 sources, 0 exposures, 0 metrics
10:53:07
10:53:14 Concurrency: 1 threads (target='dev')
10:53:14
10:53:14 1 of 1 START sql view model googleanalytics.daily_sessions ..................... [RUN]
10:53:15 1 of 1 ERROR creating sql view model googleanalytics.daily_sessions ............ [ERROR in 0.73s]
10:53:15
10:53:15 Finished running 1 view model in 0 hours 0 minutes and 7.59 seconds (7.59s).
10:53:15
10:53:15 Completed with 1 error and 0 warnings:
10:53:15
10:53:15 Runtime Error in model daily_sessions (models/googleanalytics/daily_sessions.sql)
10:53:15 404 Not found: Table my-project:None.dim_ga4__sessions_daily was not found in location my-location
10:53:15
10:53:15 Location: my-location
10:53:15 Job ID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
10:53:15
10:53:15
10:53:15 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
What am I doing wrong?
[1]https://github.com/Velir/dbt-ga4/tree/4.0.0/#multi-property-support
Is that the actual indentation in your dbt_project.yml
file? If so, the problem is with indentation. Indentation is really important for YAML files.
Presuming, indentation is not the issue, change your vars block to match the package name which is 'ga4' and not 'googleanalytics'. You can use 'googleanalytics' as a tag, but the identifier for the models and vars block is 'ga4'.
Here's what I think it should look like with the corrected identifier and spacing.
models:
dev:
...
ga4:
+schema: googleanalytics
+tags: "googleanalytics"
vars:
ga4:
property_ids: [11111111, 22222222]
static_incremental_days: 3
project: "my-project"
dataset: "ga4"
start_date: "20230601"
frequency: "daily"
The reason that it tried to run one model is that your daily_sessions.sql
model is in your project (ie outside the package). The package settings aren't recognized because your using the wrong identifier.
Try these settings.
@dgitis
Thank you for your reply.
Regarding the indentation, it was a problem that occurred when I pasted it, and in my actual configuration file, the indentation is correct. I apologize for causing confusion on that point.
Taking your advice into consideration, I changed dbt_project.yml as follows and ran dbt run --select tag:googleanalytics, which resulted in the following output:
models:
dev:
...
ga4:
+schema: googleanalytics
+tags: "googleanalytics"
vars:
ga4:
property_ids: [11111111, 22222222]
static_incremental_days: 3
project: "my-project"
dataset: "ga4"
start_date: "20230601"
frequency: "daily"
result
> dbt run --select tag:googleanalytics
04:09:47 Running with dbt=1.4.6
04:09:47 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.dev.ga4
04:09:47 Found 122 models, 21 tests, 0 snapshots, 0 analyses, 474 macros, 0 operations, 1 seed file, 69 sources, 0 exposures, 0 metrics
04:09:47 The selection criterion 'tag:googleanalytics' does not match any nodes
04:09:47
04:09:47 Nothing to do. Try checking your model configs and model specification args
Also, I changed the directory name of dbt/models/googleanalytics/daily_sessions.sql to dbt/models/ga4/daily_sessions.sql and ran it again, but the same error as 404 Not found: Table my-project:None.dim_ga4__sessions_daily was not found in location my-location
occurred.
I searched for the query that was executed from the job id in BigQuery, and found that the following SQL was executed.
/* {"app": "dbt", "dbt_version": "1.4.6", "profile_name": "default", "target_name": "dev", "node_id": "model.dev.daily_session"} */
create or replace view `my-project`.`googleanalytics`.`daily_session`
OPTIONS()
as select * from `my-project`.`None`.`dim_ga4__sessions_daily` limit 1000;
I still need your help. Could you give me any ideas?
This line in the error message:
There are 1 unused configuration paths:
- models.dev.ga4
That indicates to me that you have the nesting wrong as ga4 is nested under dev.
Jinja is very picky about nesting. dbt Cloud does some stuff to fix this, but, when things like this go wrong, be sure that you are using double space (and not tab) to nest.
@dgitis
Thank you for your response.
For the nesting issue, as you pointed out, I was entirely wrong. I've corrected my dbt_project.yml. I apologize for not fixing such easy mistakes multiple times. (Actually I'm new to dbt.)
However, the result of the dbt run execution is as follows.
> dbt run --select tag:googleanalytics
09:42:18 Running with dbt=1.4.6
09:42:18 Found 122 models, 21 tests, 0 snapshots, 0 analyses, 474 macros, 0 operations, 1 seed file, 69 sources, 0 exposures, 0 metrics
09:42:18
09:42:20 Concurrency: 1 threads (target='dev')
09:42:20
09:42:20 1 of 30 START sql incremental model googleanalytics.base_ga4__events ........... [RUN]
09:42:23 BigQuery adapter: https://console.cloud.google.com/bigquery?project=my-project&....
09:42:23 1 of 30 ERROR creating sql incremental model googleanalytics.base_ga4__events .. [ERROR in 3.17s]
09:42:23 2 of 30 SKIP relation googleanalytics.stg_ga4__events .......................... [SKIP]
09:42:23 3 of 30 SKIP relation googleanalytics.fct_ga4__sessions_daily .................. [SKIP]
...........
09:42:23 30 of 30 SKIP relation googleanalytics.fct_ga4__user_ids ....................... [SKIP]
09:42:23
09:42:23 Finished running 6 incremental models, 17 view models, 7 table models in 0 hours 0 minutes and 5.30 seconds (5.30s).
09:42:23
09:42:23 Completed with 1 error and 0 warnings:
09:42:23
09:42:23 Database Error in model base_ga4__events (models/staging/base/base_ga4__events.sql)
09:42:23 my-project:googleanalytics.events_* does not match any table.
09:42:23 compiled Code at target/run/ga4/models/staging/base/base_ga4__events.sql
09:42:23
09:42:23 Done. PASS=0 WARN=0 ERROR=1 SKIP=29 TOTAL=30
It seems that dbt is looking for a table named googleanalytics.events_*
rather than analytics_11111111.events_*.
and analytics_22222222.events_*.
Which version do you have installed? It's possible that you need to define dataset
in your dbt_project.yml
file.
From the docs:
vars:
ga4:
property_ids: [11111111, 22222222, 33333333]
static_incremental_days: 3
dataset: "my_combined_dataset"
What multi-site does is it creates table clones of all of the partitions in a dataset named whatever you put in place of "my_combined_dataset" and then the base_ga4__events model generates itself from those clones which are named events_*
.
Older versions use project: "my_combined_dataset"
.
It looks like from this line that you have the correct setting:
09:42:23 Database Error in model base_ga4__events (models/staging/base/base_ga4__events.sql)
09:42:23 my-project:googleanalytics.events_* does not match any table.
Is there a "googleanalytics" dataset in BigQuery?
Is your "project" or "dataset" setting in your dbt_project.yml
file set to "googleanalytics"?
It might be worth setting both project and dataset to the same value just to cover both old and new versions (as we're changing this right now).
Thank you for patiently replying my question.
The cause is unknown, but after trying several times, I was successful in executing dbt run --select tag:googleanalytics
.
However, there is still a remaining issue. To create the ga4_source_categories
table from the seed file, I placed the dbt/seeds/ga4/ga4_source_categories.csv file.
In dbt_project.yml, based on the document [1], I have made the following description so that the seed file is not shared between the ga4
project and dev
:
seed-paths: ["seeds"]
......
seeds:
ga4:
+enabled: true
+schema: seed_data
However, the following problem has occurred and has not yet been resolved. What could be the cause?
> dbt run --select tag:googleanalytics
11:45:57 Running with dbt=1.4.6
11:45:58 Encountered an error:
Compilation Error
dbt found two seeds with the name "ga4_source_categories".
Since these resources have the same name, dbt will be unable to find the correct resource
when looking for ref("ga4_source_categories").
To fix this, change the name of one of these resources:
- seed.dev.ga4_source_categories (seeds/ga4/ga4_source_categories.csv)
- seed.ga4.ga4_source_categories (seeds/ga4_source_categories.csv)
There is only the dbt/seeds/ga4/ga4_source_categories.csv file in the seeds directory.
You are over-thinking the seeds. The seed file already exists in the package.
You don't need to move it anywhere or upload it or anything. You don't need any seed settings in your dbt_project.yml
file.
You literally just install the package and then use the dbt seed
command.
Why don't you want to share the seed between the project and dev? It's a tiny file so cost is no issue and both the project and dev environments need it.