fabrice-etanchaud/dbt-dremio

Arrow Flight and/or API support

Opened this issue · 4 comments

Hi @fabrice-etanchaud ,

Has there been any work on arrow flight support? And is there a timeline on it?

We're considering forking and adding the support ourselves but if there is existing work, that seems like the better option.

Thanks!

Hello @mrietveld-leap , Thank you for your interest in the adapter !
No, I had no time to work on the project these last months, sorry.
I am starting a rewriting to keep close to the spark (fishtown maintained) adapter conventions.

As dbt does not process data, and only uses the connection to send server commands, what are your needs for using arrow flight ? getting rid of odbc would be a good thing, for sure.

By the way, I strongly feel the need to remove the view overlay on the table materialization. What do you think ?

Looking forward.

As dbt does not process data by itself (the only data flows I can think of are upstream when creating a seed, or downstream when consulting the information_schema), switching to Arrow flight would not bring performance benefits, but would trim the fat of all the odbc required stuff.

Adding API support instead would bring space/folders management (CREATE SCHEMA), and external tables (creation of sources from scratch).

Started having a look at https://github.com/rymurr/dremio_client to switch to an API connection.
Even if the project is read only, it would be a good starting point !
I am currently trying to wire the sql query execution to the API.
Then I would add folder creation.

Last, we could envision :

  • source creation by configuration (à la dbt-external-tables)
  • documenting dremio (tags and wiki) from dbt

started implementing a flight connection using :
https://docs.dremio.com/software/client-applications/python/