Velir/dbt-ga4

Session default channel grouping results are significantly different from the out-of-the-box values provided by GA4

Closed this issue · 4 comments

Hi,
The package is insanely useful, however, when trying to compare the data with the original data available in GA4, we found a significant difference in the channel grouping:
From live GA4 connector on Looker Studio, we got the following number of sessions:
Organic Search: X
Paid Social: Y
Direct: Z
Let's focus on these three for now.

When using the package to count sessions based on the Session default channel grouping, the results are:
Organic Search: X - 454
Paid Social: Y + 6
(none) = Direct: Z + 922

The (none) part seems to be chewing up data from the other parts.

Is that something we can somehow fix/control? Are there any other users of this package that saw a difference like that?
We tried investigating through the macros and the data and the only things we can think of are:

  1. A different "attribution" model - the stg model that calculated the sources always uses "FIRST_VALUE" - maybe it should be last?
  2. Missing reference to different data points - if there is no source and no medium - the event will get a direct group. However - what if there's a referral? Shouldn't it be different?

I am using https://github.com/Velir/dbt-ga4/blob/main/models/staging/stg_ga4__sessions_traffic_sources_daily.sql

The Query I ran was:
select session_default_channel_grouping, count(*) from ga4_marts.dim_ga4__sessions_daily where session_partition_date between date(2023,08,06) and date(2023,09,04) group by 1 order by 2 desc

and that was compared to a Looker Studio report that was filtering the same dates.

Google applies 'last non-direct' session-level attribution in their default reports. This means that they will ignore direct traffic if there was a prior non-direct source. The model you're using looks at the actual source of traffic, not the last-non-direct source.

If you'd like the last non-direct source, try using stg_ga4__sessions_traffic_sources_last_non_direct_daily.sql and see if that helps.

Okay, now I used last_non_direct_default_channel_grouping - which instead of session_default_channel_grouping and got closer numbers. It's not identical, but that makes sense.

So now we understand that dim_ga4_sessions only has session_default_channel_grouping - which is the actual source, and dim_ga4_sessions_daily has both session_default_channel_grouping and last_non_direct_default_channel_grouping.

Sorry for the misunderstanding and thanks for the quick response :)