replit/upm

python package guesser should understand direct vs transitive dependencies

turbio opened this issue · 2 comments

the map (https://github.com/replit/upm/blob/master/internal/backends/python/pypi_packages.json) we're using for python's package guesser includes modules from transitive dependencies in the same set as those directly provided. We should build a better mapping which includes both transitive and direct modules.

Transitive modules should only be used to determine which modules are already satisfied by the existing packages. When guessing we should only use modules directly from the package.

If you don't mind me asking, how is the map generated?

Good question @remram44, It's pretty crazy! We have an internal script which creates a fresh python environment, installs a specific package, then inspects the available python modules. This is then run for every single package in pypi to generate the json mapping file.