globus/globus-cli

Are strict pinnings of packages needed?

Closed this issue · 4 comments

xylar commented

I'm maintaining the conda-forge feedstocks for globus-sdk and globus-cli. So far, I've been exactly copying the package restrictions from setup.py in each feedstock, e.g.:

        'click>=6.6,<7.0',
        'jmespath==0.9.2',
        'configobj>=5.0.6,<6.0.0',
        'requests>=2.0.0,<3.0.0',
        'six>=1.10.0,<2.0.0',
        'cryptography>=1.8.1,<2.6.0'

But I also use these globus tools as part of a larger conda metapackage. I'm wondering if the upper bounds on click, configobj, requests, six and cryptography are necessary or if the developers are only being cautious. In the case of click, it prevents incorporation of the latest version into a conda environment where other packages may expect the latest version. In the case of condfigobj, requests, six and cryptography, it seems merely precautionary, since no current versions meet the restrictions. I believe it is considered best practices to leave versions open unless incompatibilities are known rather than restricting versions out of precaution, since more restriction makes the likelihood of conflicts between packages that cannot be resolved much more likely.

For now, I am planning to drop all the upper-bound restrictions on conda-forge except click but I would appreciate feedback on whether even this restriction is needed.

Thanks very much for creating and maintaining a really useful package for the community!

Thanks very much for creating and maintaining a really useful package for the community!

Thank you! 😄

Let's get down to business then:

So far, I've been exactly copying the package restrictions from setup.py in each feedstock

Although I'm not familiar with conda packaging, this sounds to me like a good strategy in general.
If you install the CLI or SDK with versions of their dependencies not specified in the setup.py config, we can't guarantee support. If you stopped doing this, in the worst case there could be a compatibility issue which a user encounters and which we are not ready and able to fix.

I'm wondering if the upper bounds on click, configobj, requests, six and cryptography are necessary or if the developers are only being cautious.

It depends very much on the projects in question. I think the simplest way is for me to explain why I think each dependency should be bounded in the way that it is:

package explanation
click 7.0 is a major release with backwards incompatible changes. We need to test against this version at the very least before releasing against it, and we need to consider moving the requirement to click>=7.0,<8 so that we can use new features like the new datetime type. However, given that several more, desirable changes are planned for 7.1, I've been putting off the effort of testing until that drops. (This is not formally tracked anywhere.) Perhaps we should test and release a CLI version allowing click>=6.6,<8.
configobj Hasn't released in some time and follows semver. Any decision to do a 6.0 of configobj is fairly likely to break us, as it would mean that configobj is doing something major (e.g. dropping py27 support).
requests This is a stable product at 2.x, and it follows semver. 3.x is more or less guaranteed to be backwards incompatible when/if it releases and we should not allow silent upgrades, as the SDK will probably be broken by those changes.
six It's highly unlikely that six will ever release a 2.x version. However, if it were released, it's highly probable that it would break our usage.
cryptography Uses an unusual, but documented, versioning scheme. Although we make conservative use of the cryptography API, and a wider range of versions would likely work, we pin to the guaranteed-compatible versions. Not doing so offers very little benefit, and we need to periodically update our packages to allow newer versions (which is fine).

Although globus-cli may be (likely is) click==7.0 compatible, we can't offer any guarantee of that, and we're not ready to support it or fix bugs relating to it at this time.

Given that many globus-cli users are not experienced python developers, they aren't prepared to troubleshoot potential issues if there is a package compatibility issue. So, with the CLI especially, we can't afford to risk exposing users to broken package versions.

In the case of condfigobj, requests, six and cryptography, it seems merely precautionary, since no current versions meet the restrictions.

Setting upper bounds on version numbers in accordance with known versioning schemes is precautionary, sure, but that is the entire purpose of those versioning schemes.
For example, cryptography promises API stability across a specific range of versions and we therefore respect that convention.

I believe it is considered best practices to leave versions open unless incompatibilities are known

I would very strongly contest this.

In spite of rumblings in the python community to use less informative versioning schemes (e.g. calendar dates), semantic versioning clearly expresses known and intentional breaking changes.

Therefore, I would say that

  • applications (e.g. CLI) should specify dependencies in a conservative version range, known to be non-broken
  • libraries may make a choice between
    • specify dependencies in a conservative version range
    • specify dependencies loosely, but document known-working versions

And that dependency versions should, unless a package has a history of breaking its versioning promises, be taken to mean what they say.

Ideally we would test against every dependent version of every single package and promise perfect compatibility at all times, but it's simply not doable.

If we had a great deal of confidence in our testsuite to cover all of the corner cases, we could test against click v7.0 and release a new version which allows for its use, in addition to click>=6.6. But given how much of the CLI and argument parsing is not covered by the tests, I think that would be reckless.
In addition to ensuring that automated tests pass, we need to put the application through its paces a bit with a major upgrade to such a central and key package. I just haven't had time for this yet.

Bear in mind that unlike changing the lower-bounds of dependencies, changing the upper bounds directly impacts the package versions installed by a naive pip install globus-cli. We don't want that command to ship a broken application.

more restriction makes the likelihood of conflicts between packages that cannot be resolved much more likely

Yes, this is true and it is unfortunate.

Python's way of packaging and shipping dependencies is problematic for this reason, but it's beyond our control to fix, unless we vendor all of our dependencies (which is non-trivial).
Certain other package managers allow for multiple versions of a package used by multiple dependent tools to coexist, but Python does not have this capability without vendoring or an equivalent solution.

We try to remain aware of this and specify loose version constraints as much as possible. Portability and compatibility with a wide range of versions is a high priority for these packages, and especially for the SDK.

For now, I am planning to drop all the upper-bound restrictions on conda-forge except click but I would appreciate feedback on whether even this restriction is needed.

If you're asking for my input, I would obviously recommend against doing this.
The bounding we've done in our setup.py files is intentional and carefully considered, so removing those bounds is removing a guard rail we've put in place.

As we're quite conservative in our choices of packages to use, you're pretty unlikely to see any issues. However, I think this is leaving a land mine for someone to trip over in the future.


I'm going to mark as a question and close, but I don't mean to stifle further discussion. If you have other questions or want to discuss further, we can follow up here (potentially reopen).

xylar commented

@sirosen, I really appreciate your detailed discussion of your thinking process. The practices in globus tools seem unusual in compared with the larger python development community but I now understand why these choices have been made. I will follow your lead on conda-forge. Thanks very much.

Part of it, at least, is that we're solving different problems.
"Libraries" and "applications" are going to demonstrate different approaches to pinning.
Most of the big python packages out there are libraries, and many of those that are mainstream programs meant to run on their own -- e.g. pip -- often are implemented purely in terms of stdlib python.

If you look at other python packages which are targeting non-developers, I think you're more likely to see conservative pinning. The only example I can think of offhand is aws-cli.

The Globus SDK's versioning is fairly permissive, by comparison, and has very few dependencies.
NB: globus-sdk does not pin the version of cryptography, so we're already being quite open.

While I'm not keen on changing the globus-cli requirements, I think you can take more liberties with globus-sdk so long as it doesn't subsequently impact the stability of globus-cli.

If there's ever an issue with globus-sdk being too restrictive, I think we would treat that as worth prioritizing and fixing, but as of yet I'm not aware of any.

xylar commented

@sirosen, I was just thinking about this discussion and wanted to reiterate that I really appreciate the time you took. In the intervening year, I've been maintaining a bunch of packages with varying degrees of strict vs. permissive dependency versioning. I've really come to appreciate what a scramble it can become when a new major version of a package is released and all the downstream packages suddenly realize their current builds aren't compatible. Just wanted to thank you again.