fivetran/dbt_fivetran_log

[Bug] Broken connector reported as "connected"

Closed this issue ยท 13 comments

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

In the Fivetran UI we can see that we have a connector (Salesforce sandbox) that is broken:
Fivetran-ui-broken

We also have our own System status dashboard where we rely on the data coming out of the models in this dbt package.
The issue here is that it's showing that this connector is healthy, described as "connected"

row-model-connected

I quickly started to dig into what might be going wrong, and perhaps here is something. If I run:

    select *
    from "DB"."SCHEMA"."stg_fivetran_log__log"

    where event_type = 'SEVERE'
        or event_subtype like 'sync%'
        and connector_id = 'crusade_plausible'
        order by CREATED_AT desc

I'm getting a sync_start then a Severe log, and then a sync_end log.

rows-from-stg-fivetran-log-log

When this later is categorised, this might be where the issue is presented?

when is_paused then 'paused'
-- a sync has never been attempted
when last_sync_started_at is null then 'incomplete'
-- a priority-first sync has occurred, but a normal sync has not
when last_priority_first_sync_completed_at is not null and last_sync_completed_at is null then 'priority first sync'
-- a priority sync has occurred more recently than a normal one (may occurr if the connector has been paused and resumed)
when last_priority_first_sync_completed_at > last_sync_completed_at then 'priority first sync'
-- a sync has been attempted, but not completed, and it's not due to errors. also a priority-first sync hasn't
when last_sync_completed_at is null and last_error_at is null then 'initial sync in progress'
-- there's been an error since the connector last completed a sync
when last_error_at > last_sync_completed_at then 'broken'
-- there's never been a successful sync and there have been errors
when last_sync_completed_at is null and last_error_at is not null then 'broken'

For us this is a high priority issue since we rely on the data from our System status dashboard where we monitor all our Fivetran connectors.

Relevant error log or model output

No response

Expected behavior

We expect the model fivetran_log__connector_status to report the connector as broken if it's broken in the Fivetran UI.

dbt Project configurations

models:
+persist_docs:
relation: true
columns: true
data_eng:
sources:
+materialized: table
fivetran_log:
+schema: # leave these blank to use the target_schema
staging:
+schema: # leave these blank to use the target_schema

vars:
fivetran_log:
fivetran_log_database: db
fivetran_log_schema: schema

Package versions

packages:

  • package: fivetran/fivetran_log
    version: 0.5.0

What database are you using dbt with?

snowflake

dbt Version

Dbt version 1.0.0

Additional Context

No response

Are you willing to open a PR to help address this issue?

  • Yes.
  • Yes, but I will need assistance and will schedule time during our office hours for guidance
  • No.

Hi @carlioth thanks so much for opening this issue and providing such detailed notes on your investigation.

Taking a look at what you provided above, I would agree that this connector should be showing as broken and not as connected. Based off the staging query you have above, I would have thought the the below line would have captured the SEVERE event and logged an error time of 2022-02-25 06:26:08.978.

max(case when connector_log.event_type = 'SEVERE' then connector_log.created_at else null end) as last_error_at,

However, after looking further into this I can see that the logic is in fact working but not in the way we would like. We have made the assumption that a broken connector would not have a sync_end event. This in fact seems to not be the case as I can see the SEVERE record and then a subsequent sync_end event. This sync_end event is then negating the last_error_at field due to the below line.

when last_error_at > last_sync_completed_at then 'broken'

For your data, since the last_error_at is technically less than the last_sync_completed_at field, this is then not recorded as a broken.

Before we take any next steps, I would like to understand better why this had a sync_end event following a failure. I will follow up with our engineering team to get a better understanding of this. Further, would you be able to share the entire contents of the JSON object that includes the SEVERE warning you provided in the screenshot above?
image

Hi @fivetran-joemarkiewicz
Thanks for the quick respons and good details.

The full log for the SEVERE says:
{"reason":"java.lang.Exception: Authentication failure. Reconnect the connector with the latest username and password","taskType":"reconnect","status":"FAILURE_WITH_TASK"}

For full transparency in my query above I've removed all the logs with WARNING as eventtype, they are also included in the model fivetran_log__connector_status.
The reason why I've removed those events is because we are right now drowning in those events. We are approx. getting 60 of these warnings per second:
{"type":"table_excluded_by_system","message":"salesforce_icrm_prod.<TABLE> has been Excluded by system. Reason : Not queryable"}

Update:
We are now seeing the same behaviour but for another connector. This time the error is:
{"reason":"com.fivetran.core.PrimaryKeyContainsNull: Null primary key found while syncing table *****
Looking at the logs we are getting the same once, first sync_start , then the Severe log, and then sync_end

Thanks for these detailed updates! I am still looking into this, but hopefully will come to a conclusion soon. I will keep you updated on my end!

Any updates on this @fivetran-joemarkiewicz

Hi @carlioth, I apologize but I do not have a strong update at the time being.

The last movement on my end was working with the product manager for the Fivetran Log connector who believe sync_end events should exist for all connectors (even broken ones). If this is the case, then my team and I will want to update the broken status logic in this package accordingly.

However, the PM was not 100% and as of Friday was looping up with the engineering team to confirm this. I hope to have an update this week!

Hi @fivetran-joemarkiewicz, any updates on this issue?

Hi @andersrundberg thanks for reaching out! I have not been able to connect with our engineering team on this at the moment to verify, but I am reasonably certain that via the connector December 2021 release notes we are going to want to update the logic within our dbt package to account for sync_end events for failed connectors since they now should register this log regardless of success or failure.

That being said, we will want to update this on our end within the dbt package to reflect the current state of the log connector. I will make an update this week in a working branch and share it here for you to test out before we roll out any updates in the next release.

Thank you so much for your patience!

@fivetran-joemarkiewicz cool, please tag @carlioth when there any news.

Hi @andersrundberg and @carlioth

Thank you again for your patience and helping us identify this issue within the package. I am currently working on a fix and believe to be on the right track. When you have availability, would you be able to test the below version of the package within your dbt project and see if it is able to identify the broken/paused/working connectors properly?

packages:
    - git: https://github.com/fivetran/dbt_fivetran_log.git
      revision: bugfix/connector-status
      warn-unpinned: false

Let me know if you have any questions and if you do or do not see the issue be resolved within this branch of the package.

Thanks!

Hi @fivetran-joemarkiewicz
I've now tested this and now I'm getting the status of broken for the broken connectors.
Working as expected ๐Ÿ‘๐Ÿป
Do you have any ETA when you think you will be able to release this fix?

That's great to hear! I still want to make a few minor changes, but will be opening this issue up for a PR review today and hopes to have the fix released tomorrow!

Edit: We typically do release freezes on Friday afternoons. Because of that, a more realistic timeline would be a Monday release.

Hi All,

I just wanted to share that the PR has been merged with the fix and the new v0.5.3 release has been cut! You should be seeing the latest version of the package to be live on the dbt hub at the top of the hour.

Feel free to create a new issue if you encounter any questions while using the Fivetran Log package. Thanks again for your help in raising and resolving this issue.