BUG - unique_xero__profit_and_loss_report_profit_and_loss_id fails when account_name column differs
jazzy-r opened this issue ยท 9 comments
Are you a current Fivetran customer?
Yes - Jared - Samarkand Global - Data Team Lead
Describe the bug
There is a test on the profit and loss report -
- name: profit_and_loss_id
description: Unique identifier for each record in the profit and loss report model.
tests:
- unique
- not_null
That tests for a unique profit_and_loss_id
However, this test fails when there is a difference in the account_name column.
This difference occurs when the account_name is changed in the Xero back office system.
This change does not update historic entries therefore you may end up with something like the following:
Advertising & Marketing
Advertising
When the profit_and_loss_id is generated it does not use the account_name. Therefore you end up with a duplicate profit_and_loss_id
Expected behavior
profit_and_loss_id to be unique in rows in the profit_and_loss_model
Package Version
packages:
- package: fivetran/xero_source
version: [">=0.3.0","<0.4.0"]
Warehouse
- [ X] BigQuery
- Redshift
- Snowflake
- Postgres
- Databricks
- Other (provide details below)
Screenshots
Please indicate the level of urgency
Low. Reports not customer facing yet but trying to ensure the package works as expected before creating dashboards.
Are you interested in contributing to this package?
- Yes, I can do this and open a PR for your review.
- Possibly, but I'm not quite sure how to do this. I'd be happy to do a live coding session with someone to get this fixed.
- No, I'd prefer if someone else fixed this. I don't have the time and/or don't know what the root cause of the problem is.
Hi @jazzy-r thanks so much for opening this issue! This is very interesting, and something we did not account for when originally building the package. Thank you for providing the extra context.
From your description it seems it would be most appropriate to modify the profit_and_loss_id
surrogate key to instead be created from the account_name
rather than the account_id
? Could you confirm that would or would not resolve the test failure and would be accurate based off your data. Additionally, in the screenshot you provided I see the two records with the same profit_and_loss_id
is your understanding that these should in fact be two separate records, or should their just be one (being the updated account name)?
https://github.com/fivetran/dbt_xero/blob/main/models/xero__profit_and_loss_report.sql#L14
Additionally, feel free to open a PR to the main
branch if this will do the fix. Otherwise, our team will address this bug in our next sprint.
Thanks for the response @fivetran-joemarkiewicz
1 - My thought was that adding account_name
to the surrogate key would be the best way.
2 - I would expect them to be a single record. However, I'm unsure how this would work in practice because it would need to "choose" the latest naming convention from Xero. There might also be cases where Xero users want to see the "old" and "new" account names for historic records.
If you're happy with option 1 then I will happily open a PR.
Thanks for the additional detail @jazzy-r
Thinking about this more, I want to understand how we can produce only a single record, as this is the intended result. In this case the test worked as intended. Since it failed, we now see there are two records for this account when we would expect there to only be one. If there are two, then the totals for the P&L could be off. I would like to understand more why there are duplicates of this account, and how we can account for this in the package.
I see we also have a test on the stg_xero__account
models to check for duplicate account_ids. Does this test also fail for you?
https://github.com/fivetran/dbt_xero_source/blob/cc54fed0e43dc40b79cb1071e3ad27b6770ff2fb/models/stg_xero.yml#L4-L10
- name: stg_xero__account
description: Each record in this table represents an account in Xero.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- account_id
- source_relation
If it does fail, can you look into the two records you mentioned above and see if there is any indication of when the name change occurred?
If it does not fail, then I think our issue may actually exist within our journal fields being declared within the xero__general_ledger model. I can see we are selecting the account information from the journal_line
table, where in fact we may want to be pulling from the account
table instead. This may cause the two records then because the account name changed during the year and was updated appropriately in the account
table, but the journal_line
table still references the old name.
journal_lines.account_id,
journal_lines.account_name,
journal_lines.account_type,
No that test does not fail it is only the unique profit_and_loss_id test.
My understanding is that one of our Xero users changed the account_name in the Xero back office system during a single month.
FROM Advertising TO Advertising & Marketing
Xero does not seem to historically update all references to "Advertising" with "Advertising & Marketing". Therefore within this same month when the change was made we have rows that should be grouped by and the value summed instead appearing as 2 separate lines.
If I understand what you're saying correctly there may be another JOIN for account_name that could fix this?
Thanks so much for looking into this @jazzy-r ๐
Based off your querying, I suspect we will want to modify the xero__general_ledger
query to group these records together. This way if other users have a similar occurrence the accounts will be rolled up into one. The outcome I see is having the general_ledger model roll up on the latest name of the account.
I will add this to my task list for the coming sprint and will post back here once I have a working version for you to test out.
@jazzy-r I apologize for the delay in my response. The Thanksgiving break and some PTO held me back from applying this fix we discussed in the above thread.
Fortunately, I was able to apply a hotfix on a working branch that you can test out! Would you be able to test the below package dependency in place of your existing Xero package dependency in your packages.yml
and let me know if this error is fixed and if the output is what you are expecting.
packages:
- git: https://github.com/fivetran/dbt_xero.git
revision: hotfix/gl-update-account-fields
warn-unpinned: false
Feel free to take a look at the PR #22 for the changes I applied in this hotfix. Let me know if this ends up working for you. If so, we can move forward with integrating this into the next release of the package ๐
@fivetran-joemarkiewicz No need to apologise! Just tested and it works.
That's great to hear, thanks for the confirmation @jazzy-r! I will merge this into the main
branch and plan to cut a release with this update later today ๐