Base GA4 events deduplication failing on event_params
Closed this issue · 0 comments
FlorianASchroeder commented
The de-duplication of events fails due to event_params
if the elements within the array are unsorted.
Artificial example:
WITH events AS (
SELECT
ARRAY<STRUCT<key STRING, value STRING>>[
STRUCT("page_path", "/"),
STRUCT("ignore_referrer", "true"),
STRUCT("session_engaged", "1")
] AS event_params
UNION ALL
SELECT
ARRAY<STRUCT<key STRING, value STRING>>[
STRUCT("ignore_referrer", "true"),
STRUCT("page_path", "/"), -- swapped order here
STRUCT("session_engaged", "1")
]
UNION ALL
SELECT
ARRAY<STRUCT<key STRING, value STRING>>[
STRUCT("page_path", "/"),
STRUCT("ignore_referrer", "true"),
STRUCT("session_engaged", "0")
]
)
SELECT *
FROM events
qualify ROW_NUMBER() OVER (PARTITION BY to_json_string(events.event_params)) = 1
Instead, an explicit ordering resolves this issue:
qualify ROW_NUMBER() OVER (PARTITION BY to_json_string(ARRAY(SELECT x FROM UNNEST(event_params) AS x ORDER BY key))) = 1