fivetran/dbt_xero

[Bug] Using BigQuery adapter, validation_errors - Invalid cast from STRING to JSON

MatthewDaniels opened this issue ยท 7 comments

Is there an existing issue for this?

  • I have searched the existing issues

Describe the issue

First, thanks for the great work on these models!

When using BigQuery as the data warehouse for my data and running the XERO transform DBT models, there is one temp step that fails, causing the invoice line items model to be skipped.

The step is stg_xero__invoice_line_item_tmp and the error is in the following line:
cast('validation_errors' as JSON) as 'validation_errors'
(this ends up on line 26 of the rendered SQL in BQ, but for my use case, I am using multiple schemas to create a single set of output models)

The error being returned by BigQuery is the following:
Invalid cast from STRING to JSON at [111:26]
image

(again, the line numbers are likely specific to my project as I have multiple input schemas)


I noticed recently for the XERO connectors (and a few others) in Fivetran whilst using BigQuery as the destination that there was the following notification and wondered if the two were related?
image

I am unsure how to troubleshoot further and support where I can -

Relevant error log or model output

No models are output.

Errors as above in the description.

Final log status from the `dbt run` command as follows:

06:52:23  Completed with 1 error and 0 warnings:
06:52:23  
06:52:23  Database Error in model stg_xero__invoice_line_item_tmp (models/tmp/stg_xero__invoice_line_item_tmp.sql)
06:52:23    Invalid cast from STRING to JSON at [111:26]
06:52:23    compiled Code at target/run/xero_source/models/tmp/stg_xero__invoice_line_item_tmp.sql


### Expected behavior

SQL is compiled appropriately and the [xero__invoice_line_items](https://github.com/fivetran/dbt_xero/blob/main/models/xero__invoice_line_items.sql) table is produced.

### dbt Project configurations

The following are the variables (obfuscated input schemas) in the `dbt_project.yml`:

```yaml
vars:
  xero_source:
    union_schemas: ['xero_xxxxxxx','xero_xxxxxxx','xero_xxxxxxx','xero_xxxxxxx','xero_xxxxxxx','xero_xxxxxxx','xero_xxxxxxx','xero_xxxxxxx','xero_xxxxxxx']

Using threads: 4 in the profile.

No other project specific vars or configurations are used.

Package versions

packages:
  - package: fivetran/xero
    version: [">=0.6.0", "<0.7.0"]

What database are you using dbt with?

bigquery

dbt Version

Core:
  - installed: 1.6.1
  - latest:    1.7.2 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - bigquery: 1.6.3 - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

After attempting an update, I get the same issues with the following output from dbt --version

Core:
  - installed: 1.7.2
  - latest:    1.7.2 - Up to date!

Plugins:
  - bigquery: 1.7.2 - Up to date!

Additional Context

No response

Are you willing to open a PR to help address this issue?

  • Yes.
  • Yes, but I will need assistance and will schedule time during our office hours for guidance
  • No.

Hi @MatthewDaniels thank you for opening this issue. I am aware of this JSON BigQuery rollout; however, I wasn't aware that this would impact the Xero package as we do not perform any JSON parsing or JSON operations in the package.

It seems that this issue is a being caused by some of your Xero connectors that you are unioning have the validation_errors field in the source, while others do not. For your schemas that do have this field the datatype is now of type JSON, but the package is converting the schemas that don't have this field to be a string datatype. You can see the logic here.

However, when looking into this further I noticed we don't even bring this field into the downstream staging model (other any other downstream models). As a result, I believe the best fix here is to simply remove the field from the macro so it is not attempted to be unioned along with all the other fields being used in the package. Is this field crucial to any other analytics you are performing? If not, I would suggest we remove the field.

Here is a branch you may test out that removes the field. Let me know if this solves your immediate issue. If you do not have any immediate concerns with removing this field then we may take the next steps to roll this out in an official release.

Note: you will replace your official dbt hub package version in your packages.yml with the following to test the changes I mentioned above.

packages:
  - git: https://github.com/fivetran/dbt_xero.git
    revision: patch/json-union
    warn-unpinned: false 

Let me know how this branch goes!

Thankyou so much for getting back to me so quickly on this - I really appreciate it and all the work that has gone into these packages!

Unfortunately I am facing the same issue after updating my package to use the patch branch (although have had partial success due to your explanation @fivetran-joemarkiewicz ๐Ÿ‘ ) - perhaps I am missing something as I am fairly new to DBT (but not to coding / scripting / data engineering) - this also makes it difficult to step through the processes to debug myself (so any pointers would be massively welcomed!).

The steps I have followed are:

  • Update the package.yml
  • execute dbt deps
  • execute dbt run

I got the same error and the compiled SQL in the targets directory has the validation_errors still present in it for some reason.

I also then attempted to explicitly execute dbt clean - I understand dbt deps does this anyway, but thought I would try anyway. I also double checked that all the targets/ & the dbt_packages/ were removed for this step (of course they were!)

I also searched the dbt_packages/ folder for the validation_errors string to see where it appears (in an attempt to step through the processing). In the new patch branch it only appears in the docs, seeds, tests and one macro (but only as part of a column name has_validation_errors).


This is the contents of my packages.yml

packages:
  # - package: fivetran/xero
  #   version: [">=0.6.0", "<0.7.0"]
  - git: https://github.com/fivetran/dbt_xero.git
    revision: patch/json-union
    warn-unpinned: false 

This is the contents of my package-lock.yml:

packages:
- git: https://github.com/fivetran/dbt_xero.git
  revision: patch/json-union
- git: https://github.com/fivetran/dbt_xero_source.git
  revision: patch/json-union
- package: fivetran/fivetran_utils
  version: 0.4.8
- package: dbt-labs/dbt_utils
  version: 1.1.1
sha1_hash: 69caaeb46e02a33fe81eefb9f7da1f5c2eac34e0

This is the last few lines of the CLI output:

22:58:25  Completed with 1 error and 0 warnings:
22:58:25  
22:58:25    Database Error in model stg_xero__invoice_line_item_tmp (models/tmp/stg_xero__invoice_line_item_tmp.sql)
  Invalid cast from STRING to JSON at [111:26]
  compiled Code at target/run/xero_source/models/tmp/stg_xero__invoice_line_item_tmp.sql
22:58:25  
22:58:25  Done. PASS=24 WARN=0 ERROR=1 SKIP=2 TOTAL=27

๐Ÿ’ฏ Partial success

I tested with a SINGLE SCHEMA (ie: without the unioning, which as noted may include some of the errors) and I did not get the problem (likely because the single schema had no validation errors).
I then added back in some of the other schemas and have got it successfully working, but not with all the schemas I wanted to include.

I am super interested to understand why this patch may not have excluded the validation errors column altogether.

Hi @MatthewDaniels I am glad you are seeing partial success! ๐ŸŽ‰ To get full success, can you try and also run dbt clean before rerunning dbt deps. This will reset your dbt_packages folder and ensure you are using the new patch version of the package.

If that doesn't work, I will take a deeper look to see if there is something else I need to adjust.

Hi @fivetran-joemarkiewicz, sorry I wasn't clear (or rather added likely too much info ๐Ÿ˜„ ) in my response (and thanks again for the super quick response!), but I did attempt to explicitly execute dbt clean - I understand dbt deps does this, but thought I would try anyway. I also double checked that all the targets/ & the dbt_packages/ folders / contents were removed for this step (of course they were!).

Both with no change to the compiled SQL, and unsure how to follow up from here, sorry!

Updated Understanding of the Issue

Hi @MatthewDaniels thanks for adding the additional context. I was confused why removing the field did not seem to work. However, upon further investigation I found that the macro I changed in the patch is not actually the cause of the issue. Instead we can see that within the fivetran_utils.union_data macro that is called in the tmp models to union the data, we are calling the dbt_utils.union_relations() macro. This macro, inspects the source table and prepares all the fields to then be unioned across the various relations. At this point we have not done any field removing, this just unions the raw data synced by the Fivetran connectors.

In your case, it seems that some of your Xero connectors have received the JSON update while others have not, this results in the union_relations macro attempting to union the same field, but with differing datatypes. Thus resulting in the error you see.

Possible Interim Solution

I was able to reproduce this error locally and made an adjustment to the underlying union_data macro that then calls the dbt_utils.union_relations macro to include and excluded argument. This way we may then exclude this unnecessary validation_errors field from the unioning step performed in the package. Would you be able to attempt to rerun your project with these latest changes that I applied and let me know if this fully solves your issue?

Please note you will need to fully delete your package-lock.yml filed and then run dbt clean, dbt deps, dbt run. I am unsure why, but when making these changes I noticed they were not picked up normally following a clean and deps unless I deleted and allowed dbt to regenerate the lock file. Let me know how this goes!

Ultimately, if this does solve the issue you are more than welcome to continue using this branch. However, I would encourage opening a support ticket and raising this datatype inconsistency to our support team to be addressed at the connector level. Thanks!

This is awesome insight, thanks heaps @fivetran-joemarkiewicz .

I can confirm that this worked with all of the schemas included! ๐ŸŽ‰

With regards to the updating of the field to JSON in the Fivetran connector - I have found three of the 9 that have not been updated - one of which has not even synced in the last 6 months (so will need to fix that!). I think two of the sources have very little updates (if any) so perhaps Fivetran may not have completed the updates due to that?

At any rate - my DBT issue is resolved and I thank you so much for the help!

Thanks for sharing @MatthewDaniels! I am happy to hear that this patch resolved your immediate issue and that you will be able to work more closely with support to understand any updates that need to be applied to the connectors.

In the meantime, I will actually keep this issue open (but will mark as won't fix) for the time being in case any others run into the same issue you experienced.

Thanks again!