duckdb/dbt-duckdb

Allow import from other python modules

azngeek opened this issue Β· 7 comments

Currently, it is not possible to import own packages. For example, I do have some transformations in some python modules but import the code from another package.

That is currently not possible, as a dbt-run does not find my own packages. Just as an example with a module. Just for the purpose of this screenshot I hacked some lines. But yes, the module exists and can also be executed locally when i use the python cli instead of dbt.

image

How can this be solved?

jwills commented

Ah so a couple of ways: one way is to create some sort of python environment for the dbt run-- Docker container, virtualenv, whatever floats your boat-- install your package inside of it, and then use it from dbt-duckdb.

Of course, that's a decent amount of work and kind of annoying if the logic inside of the custom module is changing frequently, so I added a poorly-documented profile setting called module_paths so that in your profile you could point dbt at a list of directories that you want added to sys.path on startup so that you can use them in your Python models without having to go through all of that environment setup stuff.

Thanks for the update. I saw that setting, but was not able to get it up and running.

  1. The first solution might work but as our teams usually do lot of code changes, so this is not very practical.
  2. I went for solution number 2 and this is what I tried: I created a sample module in the directory my_module and then tried to import it but i still get this error. Do you have some working examples in the repository?

17:04:41 Runtime Error in model stg_article_with_categories (models/python/stg_article_with_categories.py) Python model failed: No module named 'my_module'

image

jwills commented

Which version of the project? And you did the import like import example?

jwills commented

I should have saidβ€” I have an example of a project that used the module paths here: https://github.com/jwills/jaffle_shop_duckdb

Yes I did. What I did not do is to additionally register the module in the profiles.yaml. So to round it up, here comes my steps in case anyone needs to do it :)

How to use your own modules

In order to use your own python code in any python-model, a new plugin needs to be created and registered. At first, set the appropriate configuration in your profiles.yml.

The configuration options which are important are module_paths and plugins. You can pass any number of directories as an array to module_paths. For the plugins, just use the filename of your module.

local_warehouse:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: db/warehouse.duckdb
      threads: 24
      module_paths:
        - lib
      plugins:
        - module: my_module

This example would reflect the following file/directory structure

β”œβ”€β”€ lib
β”‚   └── my_module.py

For the Plugin, create a class which extends from the dbt.adapters.duckdb.plugins.BasePlugin class. You need to ensure that the name of the function is unique for all registered functions!

from duckdb import DuckDBPyConnection

from dbt.adapters.duckdb.plugins import BasePlugin
from dbt.adapters.duckdb.utils import TargetConfig

def foo() -> int:
    return 1729


# The python module that you create must have a class named "Plugin"
# which extends the `dbt.adapters.duckdb.plugins.BasePlugin` class.
class Plugin(BasePlugin):
    def configure_connection(self, conn: DuckDBPyConnection):
        conn.create_function("foo", foo)

The function can now be used in any model like this:

import my_module

def model(dbt, session):
    print(my_module.foo())

Also another question: Can you return any value? As it seems more complex types like dicts will result in this error:

20:32:02  Encountered an error:
Runtime Error
  Invalid Input Error: Could not infer the return type, please set it explicitly

This would be the example:

def dict_test() -> any:
    return {"key": "value"}
jwills commented

Mmm return from what? The return type of the model function needs to be something DuckDB can turn into a tableβ€” but for your own utility functions you should be able to return anything you want.