Earliest Date Not Aligned with Project var start_date
Closed this issue · 7 comments
Hi! Hoping this is the right place to submit this issue.
I have recently implemented this package to handle much of the GA4 heavy-lifting in our org and so far it has proven infinitely more valuable than I thought possible.
That said, while populating my initial data set (the cutoff is 2023-07-01), the incremental tables created (base_ga4__events) only go back as far as 08-21 which is an odd date, though I have verified that the source data in GA4 goes back to 07-01 (analytics_296372144.events_20230701).
Confirmed date in the dbt log for base_ga4__events as well
from `bigquery-dealeron-reporting`.`analytics_296372144`.`events_*`
where cast( replace(_table_suffix, 'intraday_', '') as int64) >= `20230701`
Didn't run into any timeout issues, but am at a loss.
Thanks.
Can you confirm what your start_date
variable is set to in your dbt_project.yml
file?
From project YML
vars:
ga4:
project: "bigquery-dealeron-reporting"
dataset: "analytics_296372144"
start_date: "20230701" # Earliest date to load
Yea that's odd. This doesn't sound like a dbt or package issue if the compiled query is correct.
You said that directly querying the data provides results, but dbt does not. Can you paste both queries here? The working query and the non-working query.
Sorry, by comparing queries I mean compare the query compiled by dbt:
select *
from `bigquery-dealeron-reporting`.`analytics_296372144`.`events_*`
where cast( replace(_table_suffix, 'intraday_', '') as int64) >= `20230701`
order by event_date_dt asc
to an equivalent query that is working:
select * from
from `bigquery-dealeron-reporting`.`analytics_296372144`.`events_20230701`
What is the output of both of those?
(opted to select event_date to minimize load)
As an aside, is there any sort of limitation in BQ or DBT that would prevent the creation of the tables prior to 08-21. It's exactly 60 days between 8-21 and now. Are there any settings that would lock an incremental to 60 at a time?
I've been using dbt run --full-refresh
under the assumption that I need to in order to build the incremental tables. Is there anything else I could do differently?
Resolved this issue, needed to set the partition expiration date in the BQ options in the config block of the partition table generating scripts in order to accommodate the reporting history we needed to backfill.
Thanks for all the help!