BUG - Row duplication introduced when History Mode is enabled
jcbmllgn opened this issue · 8 comments
Are you a current Fivetran customer?
Yes - Jacob Mulligan, Firefly Health, head of analytics
Describe the bug
I turned on history mode in Fivetran for the User, UserRole, Account, and Opportunity tables, Fivetran then ran a migration on our data in Snowflake which took a few minutes. I then enabled history_mode
using the configuration described here. The tests on the salesforce_source
package stared failing.
Steps to reproduce
Here is the exact code we have in our dbt_project.yml
file:
vars:
salesforce_source:
salesforce_database: raw
salesforce_schema: salesforce
using_account_history_mode_active_records: true
using_opportunity_history_mode_active_records: true
using_user_role_history_mode_active_records: true
using_user_history_mode_active_records: true
At first tests passed and we merged this into master. Today the following tests broke:
Completed with 3 errors and 0 warnings:
Failure in test unique_salesforce__opportunity_enhanced_opportunity_id (models/salesforce.yml)
Got 82 results, configured to fail if != 0
compiled SQL at target/compiled/salesforce/models/salesforce.yml/schema_test/unique_salesforce__opportunity_enhanced_opportunity_id.sql
Failure in test unique_salesforce__owner_performance_owner_id (models/salesforce.yml)
Got 7 results, configured to fail if != 0
compiled SQL at target/compiled/salesforce/models/salesforce.yml/schema_test/unique_salesforce__owner_performance_owner_id.sql
Failure in test unique_stg_salesforce__user_user_id (models/stg_tables.yml)
Got 7 results, configured to fail if != 0
I was able to trace these issues back to the stg_salesforce__user_tmp.sql
model which contains row duplication. The issue seems to be that a User was updated in SalesForce and when the update was copied over to Snowflake the old record did not have _fivetran_active
set to false
, instead _fivetran_active
was still null
. There are now two rows in the users_tmp model for one user, here's what this Looks like in Snowflake:
Because _fivetran_active
is null for the first row, the coalesce()
in the stg_salesforce__user_tmp.sql
model returns both records for that user..
select *
from raw.salesforce.user
where coalesce(_fivetran_active, true)
Expected behavior
I would expect to be able to turn on History Mode for the User, UserRole, Account, and Opportunity tables and turn on History Mode for those models in this package and have the package only select the most record record for each of these tables.
Project variables configuration
I shared this above.
Package Version
packages:
- package: dbt-labs/dbt_utils
version: 0.7.1
- package: dbt-labs/segment
version: 0.6.0
- package: fivetran/fivetran_log
version: 0.4.0
- package: fivetran/salesforce_source
version: 0.3.1
- package: fivetran/salesforce
version: 0.4.0
- package: fivetran/jira
version: 0.4.0
- package: calogica/dbt_date
version: 0.4.0
Warehouse
- BigQuery
- Redshift
- Snowflake
- Postgres
- Databricks
- Other (provide details below)
Please indicate the level of urgency
Medium urgency. This is causing production tests to fail, we rely on this package for reporting which is now slightly incorrect (miscounting by 1 opportunity) due to this issue, we can't just remove the package because we rely on it for reporting, and lastly the failing test makes all deploys of production look broken 😦
Are you interested in contributing to this package?
- Yes, I can do this and open a PR for your review.
- Possibly, but I'm not quite sure how to do this. I'd be happy to do a live coding session with someone to get this fixed.
- No, I'd prefer if someone else fixed this. I don't have the time and/or don't know what the root cause of the problem is.
A (potentially temporary) fix: drop the table in Snowflake, resync the table from scratch, then rebuild the salesforce models.
I ran this in Snowflake:
drop table raw.salesforce.user;
Then resynced the table in Fivetran:
Then reran the SF job and tests:
dbt run -m +salesforce --full-refresh
dbt test -m +salesforce
The tests are now passing. I say "potentially temporary" above because I don't know if tests will start failing again after our night sync tomorrow or Tuesday as more changes are made to users/opportunity/account/userrole which might create more data formed in a way that breaks the package tests again. I'll let you know if so.
@jcbmllgn the product manager was on pto, and then I was on pto and now we are both back and I am coordinating to confirm if the resolution you mentioned is the intended method users should take. I will let you know once I hear back.
In the meantime, have you experienced any other failing tests?
Hi @jcbmllgn just got confirmation from our product team that this is an open bug which they are working on. Currently they have seen migrations experience the same behavior you noted above.
Thanks for the update Joe.
I experienced this issue with all four tables that I turned on history mode for, dropping the tables and resyncing fixed the issue for me. I don't need to turn on history mode for other tables right now, so I don't think I'll run into this issue again before a fix is released.
Sounds great, since this issue is related to the history mode feature of the connector I would recommend opening a request within our Fivetran Support Portal so you can chat with our product/support team and be notified once the fix is applied to the connector.
Hi @jcbmllgn I just wanted to reach out to see if you were able to create a support issue and see the issue resolved?
If so, I will close this issue. Thanks!
@fivetran-joemarkiewicz sorry, I missed you message here from the fall! Want me to still open a support ticket?
I have not checked to see if the underlying issue has been fixed.. Do you happen to know it has been resolved? Thanks!
Hi @jcbmllgn yes if you are still experiencing this issue then opening a support ticket should be the path forward from here as we are unable to address this issue within the dbt package.
I am unsure if the initial issue you raised has been resolved, but the members of our support and product teams should be able to help moving forward within a support ticket if you are still experiencing the issue.
As this is not something we will be able to address within the package, I will close the issue. Please feel free to reopen the issue if you feel there is still more to discuss related to the dbt package. Thanks!