fpgmaas/deptry

deptry does not work when installed globally

fpgmaas opened this issue Β· 14 comments

Describe the bug

Whenever deptry is installed globally, it does not have access to the metadata of the packages in the virtual environment, even if that virtual environment is activated.

I will see if I can either

  • solve this, which I think will be difficult, or...
  • state clearly in the documentation that deptry should be installed within the virtual environment to be tested and optionally log this warning to the console whenever one or more dependencies are not found in the environment.

To Reproduce

install globally with pip install deptry outside of the virtualenv. Then activate a virtualenv and run deptry .

Added a warning for now: #93

Just to confirm, is it impossible to perform static code analysis with pyproject.toml, poetry.lock and .venv directory?

No, it's definitely possible to scan a project with pyproject.toml, poetry.lock and .venv. when added to the project with poetry add --dev deptry.

But it will not work to install it globally with pip install deptry and then scanning a poetry project. It really needs to be within the virtual environment. (So your project covid19-sir is not affected).

Yes, this issue is regarding global installation of deptry. I just thought, but "script" of the target pyproject.toml can be read from the outside of the virtual environment.

On Reddit, someone offered this as a potential starting point for adding the functionality; https://stackoverflow.com/a/14792407

Not sure if we should add this kind of solution to the codebase though.

@mkniewallner suggested using site.getsitepackages(), which contains all installed modules and should be available when the venv is active.
A good source of inspiration for that may be mypy which has similar needs, see here and here

Tried this out, but also unsuccessful. site.getsitepackages() does not seem to return the virtual environment's site-packages directory.

To reproduce:

  • add import site and print(site.getsitepackages()) anywhere in cli.py.
  • install deptry globally with pip install -e .
  • navigate to a directory with an installed Poetry environment and a .venv folder.
  • run poetry shell
  • run deptry .

In my case, this returns:

['/Users/florian.maas/.pyenv/versions/3.9.11/lib/python3.9/site-packages']

And a list of warnings since deptry could not find the installed packages.

However, when running the following steps:

  • poetry shell
  • python
  • import site
  • site.getsitepackages()

The output is:

['/Users/florian.maas/git/my-project/.venv/lib/python3.9/site-packages']

So then it does find the correct site-packages directory.

Hi πŸ‘‹πŸ» I have a very naΓ―ve question: from the site.getsitepackages() strategy you tried, I assume that getting the path of the active virtualenv would suffice, even if deptry is not run by the virtualenv interpreter. Could this path not be retrieved using the VIRTUAL_ENV environment variable exported by the activation script ?

If I'm completely off-topic (which I fear πŸ˜… ) I'll be glad to have some pointers to the codebase that might help me understand the problem better!

@kwentine Thanks for the suggestion. That's no naive question, don't be afraid to ask! I am not an expert at this subject myself either.

The issue lies in this part of the code. Here, we try to get the metadata of a package using importlib-metadata, for which I believe it is necessary that the path to the virtual environment is in sys.path.

Your idea of using VIRTUAL_ENV seems pretty good. However, this points to <some_path>/example-project/.venv, whereas the packages are actually stored in <some_path>/example-project/.venv/lib/python3.10/site-packages. We could try to build a solution around this that looks for a site-packages directory recursively within VIRTUAL_ENV.

An issue I can think of with this solution; how do we detect if it's necessary to perform this recursive search?

The issue lies in this part of the code.

@fpgmaas thanks for encouragements and this enlightening entry point πŸ™‚ I'd like to share an idea based on importlib.metadata's suggested extension mechanism.

First, suppose we have a way of reliably detecting if deptry is currently running in a virtualenv.

def running_in_virtualenv() -> bool:
  # See https://docs.python.org/3/library/sys.html?highlight=sys#sys.base_prefix for this strategy
  return sys.prefix != sys.base_prefix

Then, suppose we have a few heuristics to guess a project's virtualenv site-packages on the filesystem:

def find_virtualenv_site_packages() -> Path | None:
    project_dir: Path = current_project_dir()
    site_packages = None
    possible_roots = [
       os.environ.get("VIRTUAL_ENV"),
       project_dir / ".venv",
       Path("~/.virtualenvs") / project_dir.name,
  ]
  while not site_packages and possible_roots:
      site_packages = find_site_packages_below(possible_roots.pop())
  return site_packages

Then we could implement and install a sys.meta_path finder along the lines of:

from importlib.metadata import DistributionFinder

class VirtualenvDistributionFinder(DistributionFinder):
    @classmethod
    def find_distributions(cls, context):
        if not running_in_virtualenv():
            site_packages = find_virtualenv_site_packages()
            if site_packages:
                path = [site_packages, *sys.path]
                context = DistributionFinder.Context(name=context.name, path=path)
        return super().find_distributions(context)

Let me know if I need to make the idea clearer. If you think this might be a way to go, I'll work on a PR πŸ™‚

Well I realize that implementation would be highly inefficient since it would call find_virtualenv_site_packages every time package metadata is looked up. So let's say "a less clumsy variation of the above":

if not running_in_virtualenv():
    site_packages = find_virtualenv_site_packages(project_dir) 
    sys.meta_path.insert(0, VritualenvDistributionFinder(site_packages=site_packages))

I think this is the most promising and detailed starting point until now, better than what I could think of πŸ˜„ So if you think it's worth a shot, I look forward to reviewing the PR that implements this.

Late to the party, but it's probably worth taking a look at how pipdeptree added support for arbitrary virtualenvs: https://github.com/tox-dev/pipdeptree/blob/28bf158e98e95109a426aad8a0ac3b1ea2044d4a/src/pipdeptree/_non_host.py#L16