fpgmaas/deptry

deptry should never mark the project as transitive dependency

baggiponte opened this issue · 5 comments

Is your feature request related to a problem? Please describe.

When running deptry on my source code, I always get the module itself marked as a transitive dependency.

For a minimal, reproducible example, see this repo. Here is a summary.

My code resides in src/deptry_test. The folder structure is this:

.
├── src
│  └── deptry_test
│     ├── __init__.py
│     └── mod.py

In mod.py I define a random variable containing gibberish. This var is imported in the __init__.py.

When I run pdm run deptry src ., I get an error message because deptry_test is a transitive dependency (see log).

Detailed Log
Scanning 2 files...
There was 1 dependency issue found.

-----------------------------------------------------

There are dependencies missing from the project's list of dependencies:

	deptry_test

Consider adding them to your project's dependencies.

-----------------------------------------------------

Dependencies and directories can be ignored by passing additional command-line arguments. See `deptry --help` for more details.
Alternatively, deptry can be configured through `pyproject.toml`. An example:

    ```
    [tool.deptry]
    ignore_obsolete = [
        "foo"
    ]
    ignore_missing = [
        "bar"
    ]
    ignore_transitive = [
        "baz"
    ]
    extend_exclude = [
        ".*/foo/",
        "bar/baz.py"
    ]
    ```

For more information, see the documentation: https://fpgmaas.github.io/deptry/
If you have encountered a bug, have a feature request or if you have any other feedback, please file a bug report at https://github.com/fpgmaas/deptry/issues/new/choose

As a side note, IIUC this should not even be marked as transitive dependency, am I right?

Describe the solution you would like

deptry should automatically add as known-first-party the package/project itself using pyproject.toml's project.name key.

Of course, would love to work on a PR for this if you feel this should be addressed!

Hi, thanks for the detailed report and reproduction repository, appreciate it.

To determine if a module is local or not, deptry checks the root directory passed as a parameter (additionally to known-first-party):

deptry/deptry/core.py

Lines 117 to 123 in 41685dc

def _get_local_modules(self) -> set[str]:
directories = [f for f in os.scandir(self.root) if f.is_dir()]
guessed_local_modules = {
subdirectory.name for subdirectory in directories if "__init__.py" in os.listdir(subdirectory)
}
return guessed_local_modules | set(self.known_first_party)

In your reproduction repository, running pdm run deptry src instead of pdm run deptry src/deptry_test does work as you would expect:

$ pdm run deptry src
Scanning 2 files...
Success! No dependency issues found.

I do believe that the behaviour might feel a bit weird though, as there would be situations where we want to only check a specific directory while keeping the same "root" directory for deptry to check if a module is local or not.

Having the ability to explicitly set the source paths as highlighted in #177 would probably solve that.

Hi, thanks for the reply!

Indeed, I ran pdm run deptry src and no error is raised. I am afraid I do not understand why the error is raised when I pass src/deptry_test. Perhaps we should append self.root to directories if self.root contains an __init__.py as well?

I have not experimented with pre-commit enough, but if I run pre-commit run --all-files the same error is raised (added a simple .pre-commit-config.yaml to the repo, too). I can't tell whether the same error will be raised during regular pre-commit execution.

Having the ability to explicitly set the source paths as highlighted in #177 would probably solve that.

I guess that makes sense. I feel that we might still default to pyproject.toml:project.name as the default, after normalising the package name according to the appropriate PEP (i.e. from deptry-test to deptry_test).

Available to help if needed!

@baggiponte thanks for raising the issue and the example repository.

I am afraid I do not understand why the error is raised when I pass src/deptry_test.

The reason the error is raised when you run pdm run deptry src/deptry_test, is that deptry_test is not being recognized as a local module by the snippet of code @mkniewallner mentioned earlier. Because deptry is essentially running within the src/deptry_test directory. When running pdm run deptry src, the module is recognized as a local directory, because it scanned the directories within src to find them.

Perhaps we should append self.root to directories if self.root contains an init.py as well?

This would be a good idea, however, we are currently trying to move away from expecting directories to have an __init__.py, see this PR.

I guess that makes sense. I feel that we might still default to pyproject.toml:project.name as the default.

This might be a nice addition, however there are also plenty of cases where this does not hold; See for example scikit-learn, where the code is in the sklearn directory. Hence, this approach does not seem optimal.

I think in your example repository (and any other repositories with the src layout), the simple solution is to run pdm run deptry src. At the moment I cannot think of a solution that is (a) easy and intuitive to use and (b) flexible enough to support these use cases. If you have any ideas, happy to hear them!

Oh, and maybe to clarify; the reason it is marked as a transitive dependency is as follows: A transitive dependency is identified by (1) a package being available in the local environment, (2) said package being imported in the code and (3) there is an import from this package in the source code of your repository but (4) the package is not listed as a dependency. This for example flags the use of numpy in your code, while only pandas is listed as a dependency. The assumption here is that a package can only end up in your environment his way. This is not entirely true.

In this case, since PDM installs the package locally in the environment in editable mode, (1), (2), (3) and (4) are all true. So it is (incorrectly) flagged as a transitive dependency.

Shouldn't condition 4 above be "the package is not listed as a dependency and is also not the package being examined"? It is never correct to list a package as a dependency of itself, but it is fine to import things from the package elsewhere in the package.

In my case I am running poetry run deptry . in order to scan tests and (jupytext) notebooks as well as the things under src/.