pypa/setuptools

Add ability to specify "default" extras_require

Kentzo opened this issue ยท 38 comments

Libraries that implement a client to an HTTP backend often provide multiple implementations such as for tornado, aiohttp or requests. Each implementation comes with its own dependencies. User usually needs only one implementation.

With extras_require developer of the library can specify aiohttp and torndao as optional and install requests for everyone via install_requires.

However, users of aiohttp and tornado do not need requests, hence the request to be able to specify a "default" extra that should be picked when no other extras are specified.

So that users calling pip install mylib or pip install mylib[requests] will get requests installed, but users calling pip install mylib[aiohttp] won't.

This work with pip:

from setuptools import setup

setup(name='foo', version='1.0',
      install_requires='''
      requests; extra == ""
      ''',
      extras_require={
          'aiohttp': 'aiohttp',
          'tornado': 'tornado',
      },
)

But not when setuptools' easy_install tries to install the necessary requirements (UndefinedEnvironmentName: 'extra'). For the same reason this does not work either:

from setuptools import setup

setup(name='foo', version='1.0',
      extras_require={
          'aiohttp': 'aiohttp',
          'tornado': 'tornado',
          ':extra == ""': 'requests',
      },
)

IMHO, both should be allowed. @jaraco: what do you think?

PEP 508 makes no mention of such behavior, so implementing this would cause setuptools to fall out of compliance, IMHO.

Perhaps environment should provide extra == '' when no extra is specified. That would make the first example by @benoit-pierre to work.

@agronholm PEP 508 specifies the "extra" marker but it doesn't seem to specify its value when it's not set.

I've just been experimenting with this problem...

This has broken for me since 36.2. Since then, if you use extra inside your install_requires you hit the problem @benoit-pierre describes above. Stack for me looks like this:

       Traceback (most recent call last):
         File "setup.py", line 3, in <module>
           setup(name='foo', zip_safe=False, )
         ...
         File "/home/peter/setuptools/pkg_resources/__init__.py", line 2661, in _dep_map
           if invalid_marker(marker):
         File "/home/peter/setuptools/pkg_resources/__init__.py", line 1441, in invalid_marker
           evaluate_marker(text)
         File "/home/peter/setuptools/pkg_resources/__init__.py", line 1459, in evaluate_marker
           return marker.evaluate()
         File "/home/peter/setuptools/pkg_resources/_vendor/packaging/markers.py", line 301, in evaluate
           return _evaluate_markers(self._markers, current_environment)
         File "/home/peter/setuptools/pkg_resources/_vendor/packaging/markers.py", line 226, in _evaluate_markers
           lhs_value = _get_env(environment, lhs.value)
         File "/home/peter/setuptools/pkg_resources/_vendor/packaging/markers.py", line 208, in _get_env
           "{0!r} does not exist in evaluation environment.".format(name)
       pkg_resources._vendor.packaging.markers.UndefinedEnvironmentName: 'extra' does not exist in evaluation environment.

However, invalid_marker is just trying to check if a marker is valid or not, so I wondered about catching the exception there and just saying it's not valid (by returning the exception rather than False). So I created this patch.

diff --git a/pkg_resources/__init__.py b/pkg_resources/__init__.py
index 7333464..71f614e 100644
--- a/pkg_resources/__init__.py
+++ b/pkg_resources/__init__.py
@@ -1439,6 +1439,8 @@ def invalid_marker(text):
     """
     try:
         evaluate_marker(text)
+    except packaging.markers.UndefinedEnvironmentName as e:
+        return e
     except SyntaxError as e:
         e.filename = None
         e.lineno = None
diff --git a/setuptools/tests/test_egg_info.py b/setuptools/tests/test_egg_info.py
index 66ca916..b2cd384 100644
--- a/setuptools/tests/test_egg_info.py
+++ b/setuptools/tests/test_egg_info.py
@@ -284,6 +284,19 @@ class TestEggInfo(object):
         ''',
 
         '''
+        install_requires_with_extra_marker
+
+        install_requires=["barbazquux; extra != 'foo'"],
+
+        [options]
+        install_requires =
+            barbazquux; extra != 'foo'
+
+        [:extra != "foo"]
+        barbazquux
+        ''',
+
+        '''

Seems to do the trick for me! Would you like me to submit a PR?

I don't think this is right. A better alternative would be to always define the extra environment marker, setting it to None if it's not specified. I think that's what pip does.

Yeah - that would work too. I'm not sure if that aligns with pep 508, though.

The "extra" variable is special. It is used by wheels to signal which specifications apply to a given extra in the wheel METADATA file, but since the METADATA file is based on a draft version of PEP-426, there is no current specification for this. Regardless, outside of a context where this special handling is taking place, the "extra" variable should result in an error like all other unknown variables.

Yeah... I don't know why it was specified as such. It makes some things really hard, like converting metadata from wheel to egg: https://github.com/pypa/setuptools/blob/master/setuptools/wheel.py#L105

Lovely! :-)

I'm still new to the setuptools code, so don't know where the extra variable would need to be defined in a safe way. Happy to have a look at that for a PR instead if that's what's needed, but a pointer would be appreciated...

I think this is what I had:

diff --git a/pkg_resources/__init__.py b/pkg_resources/__init__.py
index 6f1071fb..db241642 100644
--- a/pkg_resources/__init__.py
+++ b/pkg_resources/__init__.py
@@ -1456,7 +1456,7 @@ def evaluate_marker(text, extra=None):
     """
     try:
         marker = packaging.markers.Marker(text)
-        return marker.evaluate()
+        return marker.evaluate({'extra': extra})
     except packaging.markers.InvalidMarker as e:
         raise SyntaxError(e)
 
@@ -2678,7 +2678,7 @@ class Distribution(object):
                             if invalid_marker(marker):
                                 # XXX warn
                                 reqs = []
-                            elif not evaluate_marker(marker):
+                            elif not evaluate_marker(marker, extra=extra):
                                 reqs = []
                         extra = safe_extra(extra) or None
                     dm.setdefault(extra, []).extend(parse_requirements(reqs))

Looks like that's not quite enough... I get this error when I run my test input (from the previous diff) against it.

E AssertionError: warning: install_lib: 'build/lib' does not exist -- no Python modules to install
E
E Couldn't find index page for 'barbazquux' (maybe misspelled?)
E No local packages or working download links found for barbazquux
E error: Could not find suitable distribution for Requirement.parse('barbazquux')

That's normal, no? Your test is trying to install a requirement that cannot be met, change it to:

 setuptools/tests/test_egg_info.py | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git i/setuptools/tests/test_egg_info.py w/setuptools/tests/test_egg_info.py
index 66ca9164..b270ba25 100644
--- i/setuptools/tests/test_egg_info.py
+++ w/setuptools/tests/test_egg_info.py
@@ -373,6 +373,20 @@ class TestEggInfo(object):
 
         [empty]
         ''',
+
+        '''
+        install_requires_with_extra_marker
+
+        install_requires=["pytz; extra != 'foo'"],
+
+        [options]
+        install_requires =
+            pytz; extra != 'foo'
+
+        [:extra != "foo"]
+        pytz
+        ''',
+
         # Format arguments.
         invalid_marker=invalid_marker,
         mismatch_marker=mismatch_marker,

Sorry - I was being dense there. I'm clearly trying to do too much in too little time and making careless mistakes as a result. Is it worth packaging this fix up as a PR?

No, this does not work... Because of the way the dependency map is built when using a dist-info distribution (wheel install), pytz will be added to the common requirements (and installed no matter the extra).

@benoit-pierre Still finding my way around this code... I think you're talking about invoking the Wheel.install_as_egg() path. When I try setting up a test to invoke that, the resulting PKG-INFO file contains:

Requires-Dist: foobar; extra == "bar"

And the metedata.json also has conditional content:

{"description_content_type": "UNKNOWN", "extensions": {"python.details": {"document_names": {"description": "DESCRIPTION.rst"}}}, "extras": [], "generator": "bdist_wheel (0.30.0)", "metadata_version": "2.0", "name": "foo", "run_requires": [{"environment": "extra == \"bar\"", "requires": ["foobar"]}], "summary": "UNKNOWN", "version": "1.0"}

That all looks plausible to me... What have I missed?

Bump @benoit-pierre! Would love to see this pushed forward in some way if you have time.

di commented

Would using the empty-string extra as a default be a reasonable approach here?

from setuptools import setup

setup(name='foo', version='1.0',
      extras_require={
          'aiohttp': 'aiohttp',
          'tornado': 'tornado',
          '': 'requests',
      },
)

I was able to get this to work via setuptools with a two-line change to pkg_resources:

diff --git a/pkg_resources/__init__.py b/pkg_resources/__init__.py
index 3ae2c5cd..2226b8ae 100644
--- a/pkg_resources/__init__.py
+++ b/pkg_resources/__init__.py
@@ -2628,6 +2628,8 @@ class Distribution:
                 raise UnknownExtra(
                     "%s has no such extra feature %r" % (self, ext)
                 )
+        if not extras:
+            deps.extend(dm.get('', ()))
         return deps

     def _get_metadata(self, name):

Presumably the empty-string-extra is not used elsewhere, and running pip install foo[] already succeeds and installs all the same dependencies as pip install foo for the package as one would expect.

di commented

I made a PR: #1503 Not going to work given the current state of things.

The option to have one extra the default one really a good idea.
In the new Asyncio world many useful libraries are replaced by completely different ones.

Any update on this? This could actually be really useful!

Any update ?
CC: @benoit-pierre

I want to give this a strong +1 as useful feature. I have different use case.

I currently have package which provides an ABC and a number of implementation each which have there own dependencies. I now want to break out each implementation into its own extra:

# Old
setup(
    ...
    instal_requires=['A', 'B', 'C'],
)

# New
setup(
    ...
    instal_requires=[],
    extra_reqires = dict(A=['A'], B=['B'], C=['C'], ALL=['A', 'B', 'C']),
)

But this would break any downstream users install scripts if they expect package[ALL].
At the same time I have users that need a "minimal install". This would not be a problem
if I could default require ALL.

I want to give this a strong +1 as useful feature. I have different use case.

This is exactly the problem that we had in gql (See issue gql #147).

We finally decided to implement a breaking change in order to have optional dependencies.

Hi all, are there any news on this?

I like the idea of allowing install_requires to be replaced by extra: { None: [list, of, requirements] }. This is how pkg_resources used to work and it reduces the total number of concepts in the system, with install_requires being a special case of extra. The idea of removing a special extra if any others have been chosen would be a new feature.

I like the idea of allowing install_requires to be replaced by extra: { None: [list, of, requirements] }. This is how pkg_resources used to work and it reduces the total number of concepts in the system, with install_requires being a special case of extra. The idea of removing a special extra if any others have been chosen would be a new feature.

The problem with this is it removes current functionality. Taking OP's example, what if he does have packages that needs to be installed regardless of the HTTP backend?

A common pattern currently is having an extras_require labeled "test" that includes packages required if you want to test the library. With install_requires being replaced as described, anybody requesting these "test" packages will not get the actual ones required as specified - breaking a number of current setups.

I don't see a problem at all, but I agree that the default will be "more cumbersome".

The problem with this is it removes current functionality. Taking OP's example, what if he does have packages that needs to be installed regardless of the HTTP backend?

You add it to the extras. E.g.:

basereq = [list, of, requirements]
{extra: { None: basereq, 'extra1', basereq + ['mylib'] }}

A common pattern currently is having an extras_require labeled "test" that includes packages required if you want to test the library. With install_requires being replaced as described, anybody requesting these "test" packages will not get the actual ones required as specified - breaking a number of current setups.

Same as above; we have the full Python power to play around with lists and sets, so it should not be a problem to have the combinations one would like to have

I would think breaking every package on pypi.org that has install_requies and extras_require defined is a massive problem. Do you expect every package to update their setup.py files and release new package versions? What if I want to install an old version for a package?

No, not a single package will break. Just to introduce a new syntax: you can define None in the extras_require as a key which defines requirements that are only additional to the default.
Nothing will break since None is currently not allowed as a key, so no one has it.
So the default is equal to an "extra" None.

if you install any extra, it takes the install_requires and the extra. If no extra is specified, it takes the install_requires and the None extra (trivially: which is empty in case it is not specified).

This makes the default less of a special case, in fact it makes everything more consistent.

Allow me to add another +1 to the already long list of +1 posts. I would like to use this in ImageIO (if anyone happens to know it), to default to a backend to install, if no backends are selected explicitly. Currently we always install pillow, but that doesn't work on all platforms.

@dholth What is currently blocking this issue from progressing into a PR? A agreed-upon solution, or the manpower to implement it?

I like the idea of a None field in extras_require (I tried this intuitively, and was surprised that it didn't work), and I am happy to help with the PR/review if that can make things move forward.

+1 I have a project which is capable of using either of two packages and selects between them at runtime.
So on install, I just need to be able to say: install_requires=['thispkg OR thatpkg']

As the developer of "C", I just bumped into a scenario that requires dependencies that can be turned off. Scenario:

  • A depends on B < 3.0
  • C depends on A and B >= 3.0

It looks like C cannot dependy on both A and a newer version of B. But hold on, B isn't really necessary for A! So I suggest to the developers of A to make B an optional dependency. This is turned down, because users of A would then by default miss the functionality of B.

A solution would be an optional dependency B that in C can be turned off.

Here is my hack to quickly implement such feature within the setup.py.

from setuptools import setup
import sys

# Hack to implement a default extra require with `lxml` and `bs4` packages that can be replaced with `[minidom]`
is_minidom_extra_enabled = any('minidom' in a for a in sys.argv)

def get_requirements():
    requirements = [   'requests',
                        'aiohttp',
                        'aioresponses',
                        'cached_property', ]
    if not is_minidom_extra_enabled:
        requirements.extend(['beautifulsoup4', 'lxml'])
    return requirements

setup(
    ...
    install_requires    =   get_requirements(),
    extras_require      =   {
                             'minidom': ['dom_query'],
                            },
)

EDIT: Actually, this does not seem to work :/

Any progress?

I know I'll be shot down for this but please no. We do not need to make dependency trees more ambiguous. This concept of recommended dependencies has already plagued apt and dnf where it leads to junk ridden environments that are several times larger than they should and/or broken environments where a developer thought one of their dependencies was optional but hadn't tested that permutation and it accidentally became mandatory. If you have interchangeable dependencies, just pick one. If you have dependencies that you don't need then don't call them dependencies. If you think your packages couples well with some other package, just say so in the documentation -- don't dump it into their environment without telling them. There are many tempting uses for this feature but none of them will improve your user or developer experiences.

My use case is to have a default implementation, but offer alternatives backends.

think requests vs httpx.
Requests is a sane default for much uses, and hence makes sense as dependency for a battery-included install. However, I'd like to give the option to install httpx or something else instead - in that case without forcing the user to install requests additionally.

... so your users get a less consistent experience, any asymmetries between requests and httpx could cause a pip install requests to break their code, and you now have 4 packaging permutations to verify work sensibly rather than 1. That's not a feature. Just pick one.

jab commented

Sounds like youโ€™re assuming a poor implementation of what @luckydonald described, which, to be fair, itโ€™s easy to get this wrong, but it doesnโ€™t have to be poorly implemented: make configuration of a non-default backing implementation explicit, rather than relying on detecting what happens to be installed(*) in a โ€œtry/importโ€ฆ/except ImportErrorโ€ cascade, and ensure the fascade library provides the same functional behavior regardless of which backing implementation is in use.

(*) Consumers arenโ€™t in control of what transitive dependencies they pick up after all. So they could end up with requests as a transitive dep via some other direct dependency, but still want to use httpx as the backing implementation for this example library.

As this topic afects the semantics of the whole Python packaging, I suppose the best place to discuss it is in https://discuss.python.org/c/packaging/14.

It is likely it will require someone to put forward a PEP and a proof-of-concept in terms of installer and build backend.