Different row count between raw data from ga4 and base_ga4__events
Closed this issue · 2 comments
Background: I'm currently using this package to generate session based data based on ga4 events raw data. In order to migrate our company old data marts to use ga4 data.
Problem Statement: as the title says.
Steps to Reproduce: Run this package and use your own raw data. check and compare count of raw table and base_ga4__events
Expected: the rows have same count
Actual Behavior: around 200k rows or 0.4% data points are missing.
Any info is there any configuration missing?
I only add config below in my dbt projects
vars:
ga4:
source_project:
property_ids:
start_date: "20240118" # Earliest date to load
static_incremental_days: 3 # Number of days to scan and reprocess on each run
query_parameter_exclusions: ["gclid","fbclid","_ga"]
query_parameter_extraction: ["gclid","fbclid","keyword"]
user_properties:
- user_property_name: "some_property_name"
value_type: "string_value"
Hi @tegardp , there is no expectation that the row counts should match. We dedupe events based on the parameters listed here:
So it is normal to have fewer events in the base model than in the raw data.
yea I'm aware of that line of code.
But actually if I remove that line of code. The value match with the Google Analytics One. Meanwhile, If I keep that line, there's bound to have some discrepancies :/