Session default channel grouping results are significantly different from the out-of-the-box values provided by GA4
Closed this issue · 4 comments
Hi,
The package is insanely useful, however, when trying to compare the data with the original data available in GA4, we found a significant difference in the channel grouping:
From live GA4 connector on Looker Studio, we got the following number of sessions:
Organic Search: X
Paid Social: Y
Direct: Z
Let's focus on these three for now.
When using the package to count sessions based on the Session default channel grouping, the results are:
Organic Search: X - 454
Paid Social: Y + 6
(none) = Direct: Z + 922
The (none) part seems to be chewing up data from the other parts.
Is that something we can somehow fix/control? Are there any other users of this package that saw a difference like that?
We tried investigating through the macros and the data and the only things we can think of are:
- A different "attribution" model - the stg model that calculated the sources always uses "FIRST_VALUE" - maybe it should be last?
- Missing reference to different data points - if there is no source and no medium - the event will get a direct group. However - what if there's a referral? Shouldn't it be different?
For the metrics pulled from dbt-ga4, can you clarify whether you're using https://github.com/Velir/dbt-ga4/blob/main/models/staging/stg_ga4__sessions_traffic_sources_daily.sql
or
If you want to include the actual SQL code, that could help as well
The Query I ran was:
select session_default_channel_grouping, count(*) from ga4_marts.dim_ga4__sessions_daily where session_partition_date between date(2023,08,06) and date(2023,09,04) group by 1 order by 2 desc
and that was compared to a Looker Studio report that was filtering the same dates.
Google applies 'last non-direct' session-level attribution in their default reports. This means that they will ignore direct traffic if there was a prior non-direct source. The model you're using looks at the actual source of traffic, not the last-non-direct source.
If you'd like the last non-direct source, try using stg_ga4__sessions_traffic_sources_last_non_direct_daily.sql and see if that helps.
Okay, now I used last_non_direct_default_channel_grouping
- which instead of session_default_channel_grouping
and got closer numbers. It's not identical, but that makes sense.
So now we understand that dim_ga4_sessions only has session_default_channel_grouping
- which is the actual source, and dim_ga4_sessions_daily has both session_default_channel_grouping
and last_non_direct_default_channel_grouping
.
Sorry for the misunderstanding and thanks for the quick response :)