Velir/dbt-ga4

Earliest Date Not Aligned with Project var start_date

Closed this issue · 7 comments

Hi! Hoping this is the right place to submit this issue.

I have recently implemented this package to handle much of the GA4 heavy-lifting in our org and so far it has proven infinitely more valuable than I thought possible.

That said, while populating my initial data set (the cutoff is 2023-07-01), the incremental tables created (base_ga4__events) only go back as far as 08-21 which is an odd date, though I have verified that the source data in GA4 goes back to 07-01 (analytics_296372144.events_20230701).

Confirmed date in the dbt log for base_ga4__events as well

from `bigquery-dealeron-reporting`.`analytics_296372144`.`events_*`
where cast( replace(_table_suffix, 'intraday_', '') as int64) >= `20230701`

Didn't run into any timeout issues, but am at a loss.

Thanks.

Can you confirm what your start_date variable is set to in your dbt_project.yml file?

From project YML

vars:
  ga4:
    project: "bigquery-dealeron-reporting"
    dataset: "analytics_296372144"
    start_date: "20230701" # Earliest date to load

Yea that's odd. This doesn't sound like a dbt or package issue if the compiled query is correct.

You said that directly querying the data provides results, but dbt does not. Can you paste both queries here? The working query and the non-working query.

To clarify - the initial GA4 data set (sharded tables) contain data all the way back to 0701, when querying the base_events table or stg_ga4__events after a dbt full refresh, the earliest date returned was 08-21

image

image

Sorry, by comparing queries I mean compare the query compiled by dbt:

select * 
from `bigquery-dealeron-reporting`.`analytics_296372144`.`events_*`
where cast( replace(_table_suffix, 'intraday_', '') as int64) >= `20230701`
order by event_date_dt asc

to an equivalent query that is working:

select * from 
from `bigquery-dealeron-reporting`.`analytics_296372144`.`events_20230701`

What is the output of both of those?

image

(opted to select event_date to minimize load)

image

In DBT:
image

As an aside, is there any sort of limitation in BQ or DBT that would prevent the creation of the tables prior to 08-21. It's exactly 60 days between 8-21 and now. Are there any settings that would lock an incremental to 60 at a time?

I've been using dbt run --full-refresh under the assumption that I need to in order to build the incremental tables. Is there anything else I could do differently?

Partition expiration?
image

Resolved this issue, needed to set the partition expiration date in the BQ options in the config block of the partition table generating scripts in order to accommodate the reporting history we needed to backfill.

Thanks for all the help!