fivetran/dbt_xero

BUG - unique_xero__profit_and_loss_report_profit_and_loss_id fails when account_name column differs

jazzy-r opened this issue ยท 9 comments

Are you a current Fivetran customer?

Yes - Jared - Samarkand Global - Data Team Lead

Describe the bug

There is a test on the profit and loss report -

  - name: profit_and_loss_id
    description: Unique identifier for each record in the profit and loss report model.
    tests:
      - unique  
      - not_null

That tests for a unique profit_and_loss_id

However, this test fails when there is a difference in the account_name column.

This difference occurs when the account_name is changed in the Xero back office system.

This change does not update historic entries therefore you may end up with something like the following:

Advertising & Marketing
Advertising

When the profit_and_loss_id is generated it does not use the account_name. Therefore you end up with a duplicate profit_and_loss_id

Expected behavior

profit_and_loss_id to be unique in rows in the profit_and_loss_model

Package Version

packages:
  - package: fivetran/xero_source
    version: [">=0.3.0","<0.4.0"]

Warehouse

  • [ X] BigQuery
  • Redshift
  • Snowflake
  • Postgres
  • Databricks
  • Other (provide details below)

Screenshots
image
Please indicate the level of urgency

Low. Reports not customer facing yet but trying to ensure the package works as expected before creating dashboards.

Are you interested in contributing to this package?

  • Yes, I can do this and open a PR for your review.
  • Possibly, but I'm not quite sure how to do this. I'd be happy to do a live coding session with someone to get this fixed.
  • No, I'd prefer if someone else fixed this. I don't have the time and/or don't know what the root cause of the problem is.

Hi @jazzy-r thanks so much for opening this issue! This is very interesting, and something we did not account for when originally building the package. Thank you for providing the extra context.

From your description it seems it would be most appropriate to modify the profit_and_loss_id surrogate key to instead be created from the account_name rather than the account_id? Could you confirm that would or would not resolve the test failure and would be accurate based off your data. Additionally, in the screenshot you provided I see the two records with the same profit_and_loss_id is your understanding that these should in fact be two separate records, or should their just be one (being the updated account name)?

https://github.com/fivetran/dbt_xero/blob/main/models/xero__profit_and_loss_report.sql#L14

Additionally, feel free to open a PR to the main branch if this will do the fix. Otherwise, our team will address this bug in our next sprint.

Thanks for the response @fivetran-joemarkiewicz

1 - My thought was that adding account_name to the surrogate key would be the best way.

2 - I would expect them to be a single record. However, I'm unsure how this would work in practice because it would need to "choose" the latest naming convention from Xero. There might also be cases where Xero users want to see the "old" and "new" account names for historic records.

If you're happy with option 1 then I will happily open a PR.

Thanks for the additional detail @jazzy-r

Thinking about this more, I want to understand how we can produce only a single record, as this is the intended result. In this case the test worked as intended. Since it failed, we now see there are two records for this account when we would expect there to only be one. If there are two, then the totals for the P&L could be off. I would like to understand more why there are duplicates of this account, and how we can account for this in the package.

I see we also have a test on the stg_xero__account models to check for duplicate account_ids. Does this test also fail for you?
https://github.com/fivetran/dbt_xero_source/blob/cc54fed0e43dc40b79cb1071e3ad27b6770ff2fb/models/stg_xero.yml#L4-L10

  - name: stg_xero__account
    description: Each record in this table represents an account in Xero.
    tests: 
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns:
            - account_id
            - source_relation

If it does fail, can you look into the two records you mentioned above and see if there is any indication of when the name change occurred?

If it does not fail, then I think our issue may actually exist within our journal fields being declared within the xero__general_ledger model. I can see we are selecting the account information from the journal_line table, where in fact we may want to be pulling from the account table instead. This may cause the two records then because the account name changed during the year and was updated appropriately in the account table, but the journal_line table still references the old name.

        journal_lines.account_id,
        journal_lines.account_name,
        journal_lines.account_type,

No that test does not fail it is only the unique profit_and_loss_id test.

My understanding is that one of our Xero users changed the account_name in the Xero back office system during a single month.

FROM Advertising TO Advertising & Marketing

Xero does not seem to historically update all references to "Advertising" with "Advertising & Marketing". Therefore within this same month when the change was made we have rows that should be grouped by and the value summed instead appearing as 2 separate lines.

If I understand what you're saying correctly there may be another JOIN for account_name that could fix this?

Thanks so much for looking into this @jazzy-r ๐Ÿ‘€

Based off your querying, I suspect we will want to modify the xero__general_ledger query to group these records together. This way if other users have a similar occurrence the accounts will be rolled up into one. The outcome I see is having the general_ledger model roll up on the latest name of the account.

I will add this to my task list for the coming sprint and will post back here once I have a working version for you to test out.

@jazzy-r I apologize for the delay in my response. The Thanksgiving break and some PTO held me back from applying this fix we discussed in the above thread.

Fortunately, I was able to apply a hotfix on a working branch that you can test out! Would you be able to test the below package dependency in place of your existing Xero package dependency in your packages.yml and let me know if this error is fixed and if the output is what you are expecting.

packages:
    - git: https://github.com/fivetran/dbt_xero.git
      revision: hotfix/gl-update-account-fields
      warn-unpinned: false

Feel free to take a look at the PR #22 for the changes I applied in this hotfix. Let me know if this ends up working for you. If so, we can move forward with integrating this into the next release of the package ๐Ÿ˜„

@fivetran-joemarkiewicz No need to apologise! Just tested and it works.

That's great to hear, thanks for the confirmation @jazzy-r! I will merge this into the main branch and plan to cut a release with this update later today ๐Ÿ˜„

@jazzy-r I am happy to say this fix is now live in the v0.3.1 release of the package on the dbt hub!

Thanks again for helping identify this and for working with us to make this package better! Have a great weekend ๐Ÿ˜„