fabrice-etanchaud/dbt-dremio

DBT Dremio cannot create automatic documentation

Closed this issue · 5 comments

Hi Fabrice,

Dremio is currently being adopted by my organization and is great to have the ability to build dbt pipelines against it. Thank you for creating this extension.

Recently we found that the dbt docs generate command currently fails with the message below.

Would be possible to address the missing implementation of get_catalog? Or is there maybe an option to avoid this message?

Thank you!

=======================================================================
dbt docs generate

15:10:52 Running with dbt=1.0.5
15:10:52 Found 3 models, 0 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
15:10:52
15:10:52 schema_relation:"XXXXX"
15:10:55 Concurrency: 1 threads (target='prod')
15:10:55
15:10:55 Done.
15:10:55 Building catalog
15:10:55 Encountered an error while generating catalog: Compilation Error in macro get_catalog (macros\adapters\metadata.sql)
get_catalog not implemented for dremio

in macro default__get_catalog (macros\adapters\metadata.sql)
called by macro get_catalog (macros\adapters\metadata.sql)
called by macro get_catalog (macros\adapters\metadata.sql)
15:10:55 dbt encountered 1 failure while writing the catalog

Hi @gnhernandez , thank you for giving dbt-dremio a try !
I can see you are using dbt 1.0.5.
Could you please tell me which dremio version you are on ?
I suspect a problem with sys.reflections columns which changed name recently.

As I saw you are from NY..
https://www.youtube.com/watch?v=kX7p1pxEhKs

Best regards from French west coast.

Thank you very much for the prompt reply @fabrice-etanchaud .

The Dremio version we are using is

Dremio Community Edition 20.1.0, built on February 6th 2022.

Am I reading correctly between the lines that "dbt docs" used to work with older versions? I was taking a cursory look at the error message and saw other adapters include a catalog.sql file with a get_catalog macro returning information on schemas, tables and columns (just checked the postgres and snowflake ones).

I was then thinking that maybe this file needed to be created for Dremio using something like these tables

https://docs.dremio.com/software/advanced-administration/querying-metadata/

I'm not by all means a sql or dbt expert (by miles) so I'm not sure at all if this understanding is correct.

What do you think? And tell me honestly if I'm off :)

Thank you for the video, it brought memories of the city pre pandemic.

And send you the best from the US East Coast

Napoleon

Hi @gnhernandez , do you use dbt-dremio v1.0.4.0 ?
I don't understand why it does not work,
as there is a get_catalog implementation :
https://github.com/fabrice-etanchaud/dbt-dremio/blob/v1.0.4.0/dbt/include/dremio/macros/catalog.sql

Or are youpip3 install -e .inside a master checkout ?
In that case, could you please pull from github, and try again ?

Best,
Fabrice

Hi Fabrice,

Thank you so much for pointing this out. I'm not sure at all how it happened but my installed version of dbt-dremio was missing this file.

I rebuild the environment from scratch and indeed the catalog.sql file got installed this time.

Apologies for raising a red herring here, I really have no clue how this happened.

Thank you very much again for putting this library together, it is awesome and is opening the door to other people whose organizations are adopting dremio.

All the best,
Napoleon

Hi Napoleon, happy to hear it finally worked !
A new release is coming, with the following features :

  • simple stupid table materialization (getting rid of the hazardous blue/green table handling)
  • incremental (append only for time being) materialization, based on iceberg
  • format options for sources and models (csv, arrow, json, parquet, iceberg; excel and delta only for sources)
  • separate database+schema configuration for table based materializations living in a datalake source and views living in spaces
  • 'twin' strategy config to handle what to do when a model changes sides (space <=> datalake) : allow, prevent, clone

If you have time, I would be very pleased to have a short conversation and have your feedback on the project !
Best,