databricks/databricks-vscode

[BUG] Cannot import from custom library with src/ layout in "Run as workflow"

bixel opened this issue · 2 comments

Describe the bug
When running a .py file as a workflow, imports do not work from custom libraries when using the src/ layout (see e.g. https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/).
The /src directory is not present in the path.

To Reproduce

  1. Create a python project using the src-layout, e.g. with poetry new --src --no-interaction custom. This results in a structure like
    $ tree custom/
    custom
    ├── README.md
    ├── pyproject.toml
    ├── src
    │   └── custom
    │       └── __init__.py
    └── tests
        └── __init__.py
    
    4 directories, 4 files
    
  2. Everything is synced successfully to databricks
  3. create a workload or databricks notebook anywhere in the repo and import the custom library, e.g.
    import custom
    print(dir(custom))
  4. Run the file as a workload via the VSCode extension
  5. Local cross-check via pip install -e custom and python workload.py should work

System information:

  1. VSCode Info
    Version: 1.85.1
    Commit: 0ee08df0cf4527e40edc9aa28f4b5bd38bbff2b2
    Date: 2023-12-13T09:48:16.874Z (3 wks ago)
    Electron: 25.9.7
    ElectronBuildId: 25551756
    Chromium: 114.0.5735.289
    Node.js: 18.15.0
    V8: 11.4.183.29-electron.0
    OS: Darwin arm64 22.6.0

  2. Databricks Extension Version v1.2.4

Databricks Extension Logs
Please attach the databricks extension logs

The Issue template link is broken.

Additional context
It looks like the VSCode extension injects some python code before running workflows. This code inserts the root project directory into the python path which enables using a flat-layout for python packages. This might be the place where a src-layout should also be supported.

Hi @bixel. Indeed we do not support src packaging. We look at all local file imports as module imports and not as libraries. This is equivalent to running the code directly with python workload.py instead of doing a pip install -e custom and then python workload.py.

If you want proper library support, I suggest using a Databricks Asset Bundle (https://docs.databricks.com/en/dev-tools/bundles/index.html), which allow for defining libraries to be used for your notebook.

We are working on integrating this view into vscode, but that will be atleast a month before we have a beta.

Hi @kartikgupta-db, thanks for the lightning fast response :) At first glance Asset Bundles look promising, I just now learned about that.