lightdash/dbt2looker

Column Type None Error - Field's Not Converting To Dimensions

sisu-callum opened this issue ยท 19 comments

When running dbt2looker --tag marts on my mart models, I receive dozens of errors around none type conversions.

20:54:28 WARNING Column type None not supported for conversion from snowflake to looker. No dimension will be created.

Here is the example of the schema.yml file.

image

The interesting thing is that it correctly recognizes the doc that corresponds to the model. The explore within the model file is correct and has the correct documentation.

Not sure if I can be of any more help but let me know if there is anything!

Also tested a clean installation of 0.19.1 but that didn't resolve it!

owlas commented

@sisu-callum I can replicate this if a model has a schema.yml but doesn't exist yet in the data warehouse. The type information relies on the table having been materialized. I need to add that to the docs.

Could that be the issue? Are you working on a branch where those tables don't exist?

I'll add some debugging statements just now and push and update, which should help find the problem.

owlas commented

@sisu-callum could you try again with version 0.6.2: pip install --upgrade dbt2looker

Should give some more helpful error messages if the problem is missing tables in snowflake

Note: I'm using +persist_docs on these tables to ensure the documentation lives in Snowflake as well. Could that be impacting things?

I've upgraded dbt2looker and those errors seems to be gone. The strange thing is that I'm still not getting any dimensions materialized within the view file, which implies the error isn't missing tables in Snowflake. The steps I took were:

  1. dbt run
  2. dbt compile
  3. dbt docs generate
  4. dbt2looker --tag marts

Still the blank view files. Anything else I can try to help?

I added another model to my list of tagged files just to test. It gets picked up within the model file (along with the correct description) but the columns aren't getting picked up. I wonder if there's something with my manifest that is causing the issue.

owlas commented

@sisu-callum what's the exact output you're getting from dbt2looker?

Is it silent or are you seeing something like:

WARNING Model model.hubble_analytics.notexist not found in catalog. No looker view will be generated. Check if model has materialized in bigquery at hubble-analytics-prod.dbt.notexist

I guess you can't share the manifest.json and catalog.json. Are you able to share some extracts for the models that aren't working?

e.g. with the jq tool:

cat manifest.json | jq '.nodes["model.hubble_analytics.pages"]'
cat catalog.json | jq '.nodes["model.hubble_analytics.pages"]'

I'm also using +persist_docs but I haven't tested with snowflake so it could be!

Here is the output that I'm getting:

image

You're right in that I can't share those files but I can share a little bit here. Does this match yours? Just from this little part I drew out, I wonder if it's because data_types are showing as null.

"columns": {"objective_id": {"name": "objective_id", "description": "Lorem ipsum - removing description", "meta": {}, "data_type": null, "quote": null, "tags": []}, "objective_type": {"name": "objective_type", "description": "Lorem ipsum - why are the data_types set to null", "meta": {}, "data_type": null, "quote": null, "tags": []}

owlas commented

Your manifest.json looks similar to mine. The data_type is also null for me but it exists in the catalog.json. How does that look in the catalog.json?

Since you didn't get a warning I guess the table does exist in catalog.json

Here are the relevant columns from the same model in the catalog.json

"columns": {"OBJECTIVE_ID": {"type": "NUMBER", "index": 1, "name": "OBJECTIVE_ID", "comment": null}, "OBJECTIVE_TYPE": {"type": "TEXT", "index": 2, "name": "OBJECTIVE_TYPE", "comment": null}

I've got to step out for a bit but one thing I did just notice is that when I try to run the command you provided (just installed jq to test) it returns an error. This could be due to my novice skills with jq or perhaps this is where the issue is coming from.

image

The filepath for my catalog file is target > catalog.json

owlas commented

That looks fine! I'll have another look and add some more debugging statements.

Sorry you're right, from the root of your dbt repo it should be
cat target/catalog.json | jq '.nodes["model.stg_prod_objective"]'

Interesting! That actually returned null but when I prefixed the dbt project name to the model name it worked and returned results. Ex: cat target/catalog.json | jq '.nodes["model.my_dbt_project_name.stg_prod_objective"]'. Would this issue potentially be caused by the fact that we don't use the default dbt project name in the dbt_project.yml?

Here is the output of those two columns.
image

This is also a model where I am not persisting docs (we don't persist them in the stg layer) so that is why the comments are null. I tested on one of the actual models that persists docs and the results have the correct descriptions in the comment field. Does dbt2looker pull the descriptions of columns from the catalog.json or the manifest.json?

owlas commented

@sisu-callum nice! Thanks so much for helping with this by the way.

dbt2looker pulls comments from the manifest.json not catalog.json. That way I'd expect it to work without persist_docs.

There's definitely something wrong in dbt2looker with snowflake because you have all the information needed in your files. I'll improve the logging to help pinpoint the problem.

owlas commented

@sisu-callum can you update to dbt2looker==0.6.3, I've added some debugging statements. Could you run:

dbt2looker --log-level DEBUG

against the models you want to test and share the output if there's nothing too sensitive in there

First of all, that is some crazy fast speed on new versions! Ran this on the tagged models and got some interesting outputs! Here is an example:

14:30:34 DEBUG Model model.my project name.stg_prod_objective has no typed columns, no dimensions will be generated. unique_id='model.my project name.stg_prod_objective' resource_type='model' database='My DB' db_schema='My Schema' name='stg_prod_objective' description='To do' columns={'objective_id': DbtModelColumn(name='objective_id', description='Lorum ipsum I've changed this part.', data_type=None, meta=DbtModelColumnMeta(looker=Dbt2LookerMeta(measures=[]))),

Looks like this kind of response for all of the columns is pretty consistent across all the models included. This has me thinking its related to the fact that the data type is None. Every single column inside of my measures has data_type = None

I also got these errors at the beginning though. As I was typing this out I realized that my project uses a few dbt packages, which have their own dbt_project.yml files, which is probably causing that issue. But not sure if this is the source of the error as dbt2looker seems to be capturing all the right information except for data_type. Not sure on the multiple catalogs though - I can only see the single version so not sure what might be causing multiple to be recognized unless they are hidden.

14:30:33 WARNING Multiple manifest.json files found in path ./ this can lead to unexpected behaviour
14:30:34 DEBUG Detected valid manifest at target/manifest.json
14:30:34 WARNING Multiple catalog.json files found in path ./ this can lead to unexpected behaviour
14:30:34 DEBUG Detected valid catalog at target/catalog.json
14:30:34 WARNING Multiple dbt_project.yml files found in path ./ this can lead to unexpected behaviour

Taking a look at the manifest.json (not the catalog this time) this is what the results look like with jq. So same format as I provided up (a few comments ago) above but way cleaner.

image

owlas commented

๐Ÿคฆโ€โ™‚๏ธ I've found it. It's because snowflake converts column names to uppercase and dbt2looker treats the column names as case sensitive.

I just noticed now that in your manifest it's objective_id and in the catalog it's OBJECTIVE_ID

My favourite kind of bug after unicode bugs!

Thanks so much for taking the time to share the outputs by the way (I can see it takes some effort to redact the info).

I'm going to push an update now.

owlas commented

@sisu-callum 0.7.0 has been released!

dbt2looker now treats all columns as lower case. This is a bit of a heavy-handed solution for now but it works because postgres/redshift/snowflake/bigquery all have case-insensitive column names.

If this is the only problem then your column types should appear and start generating dimensions in the lookml

That seemed to do the trick! I noticed that TIMESTAMP_TZ seems to be producing an error. This is helpful in pointing out columns that I need to convert to TIMESTAMP_NTZ but figured I'd flag.

image

Nevermind, I see the comment in the code. With that, I think this is a resolved issue!

TIMESTAMP_TZ not supported (see https://docs.looker.com/reference/field-params/dimension_group)