pypa/pip

Drop support for Python < 2.7.9

dstufft opened this issue · 34 comments

Older versions of Python 2.7 (prior to 2.7.9) don't have the capability to have a good TLS configuration, and thus it would be great to drop support for them. We're not currently at the point that we can do that, but I wanted to open this issue to both track that we should do that at some point, and also take notes about the current state and how to query in the future.

Results:

Python Version Download Count Percent
>=2.7.9 268456729 53%
<2.7.9 155669164 31%
3.5 35605564 7%
3.4 23114420 5%
2.6 13023950 3%
3.6 11948118 2%
3.3 1278506 0.3%
3.7 231670 0.05%

The results can be queried with:

SELECT
  CASE
    WHEN REGEXP_MATCH(details.python, r"2\.7\.(9|\d\d)") THEN '>=2.7.9'
    WHEN REGEXP_MATCH(details.python, r"2\.7\.") THEN '<2.7.9'
    ELSE REGEXP_EXTRACT(details.python, r"^([^\.]+\.[^\.]+)")
  END AS python_version,
  COUNT(*) AS download_count,
FROM
  TABLE_DATE_RANGE( [the-psf:pypi.downloads], DATE_ADD(CURRENT_TIMESTAMP(), -31, "day"), DATE_ADD(CURRENT_TIMESTAMP(), -1, "day") )
WHERE
  details.installer.name = 'pip'
GROUP BY
  python_version,
ORDER BY
  download_count DESC
LIMIT
  100
alex commented

Your query isn't totally correct, 2.7.9 itself doesn't match the >= regex, it matches the second one.

Ah right, I blame doing this early in the morning :)

Ok, updated the query and results in my first post to reflect the real numbers. Still not enough to drop support for it yet, but the numbers look a lot better this way.

Does RedHat/CentOS 7 Python 2.7.5 count in those stats and also as having "bad TLS"? I'm a CentOS noob, but supposedly the RH-packaged Python has some fixes backported, i.e. are you talking about TLS 1.2?

Is your plan about denying pip based on Python <2.7.9, or a breaking change that earlier versions won't like (TLS 1.2)?

$ cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core) 
$ python -V
Python 2.7.5
$ python -c "import json, urllib2; print json.load(urllib2.urlopen(\
    'https://www.howsmyssl.com/a/check'))['tls_version']"
TLS 1.2

@nickstenning Those statistics count whatever is returned by platform.python_version() on that Python, so presumably the RHEL/CentOS 2.7.5 is showing up as 2.7.5 and thus is <2.7.9. I don't know what patches they've applied.

To be clear, this is not a short term issue. It is a place holder mostly to track how >=2.7.9 adoption is going to make a decision about when it is the right time to drop support. That is unlikely to be before it gets into single digit usage.

True, I guess I'm just alluding to how RHEL versions are super-clingy (10-year support, so 7, which has 2.7.5, might be around in reasonable numbers until 2024). That might keep the "<2.7.9" number artificially high, while not truly being reflective of the users you'd disrupt.

is there a reasonable/reliable way to identify the actual distribution of linux that's being used?

Here are today's numbers:

Python Version Download Count Percent Delta
>=2.7.9 293349226 54% +1%
<2.7.9 141872623 26% -5%
3.5 46714212 9% +2%
3.4 23322504 4% +1%
3.6 22756263 4% +2%
2.6 14454324 3% +0%
3.3 1321803 0.2% -0.1%
3.7 285099 0.05% +0%

Please let me know if I'm way off-base and looking at the wrong thing... To examine that <2.7.9 group a bit:

Python Version Distro Downloads % urllib2 uses TLS 1.2
2.7.6 Ubuntu 14.04 54809493 10.07% yes*
2.7.6 null null 37728581 6.94% ???
2.7.5 CentOS Linux 7 17152820 3.15% yes*
2.7.3 Ubuntu 12.04 6890300 1.27% yes*
2.7.5 CentOS Linux 7.3.1611 3596891 0.66% yes
2.7.5 CentOS Linux 7.2.1511 2534648 0.47% yes
2.7.3 null null 2504846 0.46% ???
2.7.6 Ubuntu 12.04 2355638 0.43% yes*
2.7.5 null null 2329032 0.43% ???
2.7.5 RHEL Server 7.3 2223663 0.41% yes
2.7.6 debian jessie/sid 841274 0.15% ?‡
2.7.3 debian 7.11 748925 0.14% ?‡

* depends on patch level
† this could plausibly be Ubuntu 14.04/16.04/CentOS 7?
‡ don't have any Debian systems

Query (full results):

SELECT
  details.python as python_version,
  details.distro.name as distro_name,
  details.distro.version as distro_version,
  COUNT(*) AS download_count,
FROM
  TABLE_DATE_RANGE( [the-psf:pypi.downloads], DATE_ADD(CURRENT_TIMESTAMP(), -31, "day"), DATE_ADD(CURRENT_TIMESTAMP(), -1, "day") )
WHERE
  details.installer.name = 'pip'
  AND REGEXP_MATCH(details.python, r'2\.7\.[0-8]($|[^\d])')
GROUP BY
  python_version, distro_name, distro_version
ORDER BY
  download_count DESC
LIMIT
  100

"urllib2 uses TLS 1.2" checked by the output of python -c "import json, urllib2; print json.load(urllib2.urlopen('https://www.howsmyssl.com/a/check'))['tls_version']"

  • Ubuntu 12.04.5 LTS / Python 2.7.3 (default, Oct 26 2016, 21:01:49)
  • Ubuntu 14.04.5 LTS / Python 2.7.6 (default, Oct 26 2016, 20:30:19)
  • Ubuntu 14.04.5 LTS / Python 2.7.6 (default, Jun 22 2015, 17:58:13)
  • CentOS Linux release 7.2.1511 / Python 2.7.5 (default, Aug 18 2016, 15:58:25)
  • CentOS Linux release 7.3.1611 / Python 2.7.5 (default, Nov 6 2016, 00:28:07)

From a thread on Python-Dev and it pointing to https://github.com/ouspg/trytls/tree/shootout-0.2/shootout I'm not sure if there's a problem with the functionality (insecure as it may be, but then older TLS are insecure anyways?) of TLS in older distros/Python 2.7.x's, or if it would actually break things.

Oh, looks like the TLS protocol is logged to the DB:

Python TLSv1 TLSv1.1 TLSv1.2 Totals
2.6 0.19% 0.01% 2.45% 2.66%
<2.7.9 0.27% 0.25% 25.55% 26.07%
≥2.7.9 2.66% 0.31% 50.93% 53.90%
3.3 0.00% 0.03% 0.21% 0.24%
3.4 0.04% 0.11% 4.14% 4.28%
3.5 0.06% 0.15% 8.37% 8.58%
3.6 0.00% 0.09% 4.09% 4.18%
3.7 0.01% 0.04% 0.05%
Totals 3.22% 0.96% 95.79% 99.96%

I guess I'm now even more confused as most of the old TLS connections (in number and proportion) are made by Python ≥2.7.9.

@nicktimko I think that is macOS.

I was curious how this compared over time, so here are the percent of downloads using >=2.7,2.7.9:

Month Percent of Downloads >=2.7,<2.7.9 Delta
2017-06 24.5 -1.6%
2017-05 26.1 -3%
2017-04 29.1 -0.4%
2017-03 29.5 -3.4%
2017-02 32.9 -3.2%
2017-01 36.1 -0.9%
2016-12 37.0 -0.2%
2016-11 37.2 -1.6%
2016-10 38.8 -3.6%
2016-09 42.4 -3.3%
2016-08 45.7 -0.5%
2016-07 46.2 -1%
2016-06 47.7

Gotten using this query:

SELECT
  STRFTIME_UTC_USEC(timestamp, "%Y-%m") AS yyyymm,
  ROUND(100 * SUM(CASE
        WHEN REGEXP_MATCH(details.python, r"2\.7\.(9|\d\d)") THEN 0
        WHEN REGEXP_MATCH(details.python, r"2\.7\.") THEN 1
        ELSE 0 END) / COUNT(*), 1) AS percent_lt279,
  COUNT(*) AS download_count
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    DATE_ADD(CURRENT_TIMESTAMP(), -1, "year"),
    CURRENT_TIMESTAMP()
  )
WHERE
  details.installer.name = 'pip'
GROUP BY
  yyyymm
ORDER BY
  yyyymm DESC
LIMIT
  100

What's the change that wants to be made that would break <2.7.9?

@nicktimko Remove the need to continue to support emulation for SSLContext objects, which would also free up the ability to start trusting the platform network store on Linux machines and to allow Python to start validating the hostname instead of having to copy that functionality into requests. It also will allow us to start mandating TLSv1.2+ on the client side.

I'm curious as to what dropping this support would look like: will the current warning be changed into a fatal-error message and pip aborting or it'll be allow albeit requiring jumping through hoops to make it possible for the end-user?

I think it should be the former.

I was curious how this compared over time,

It looks to me like it'll be another 6 months+ till the amount of requests from Python < 2.7.9 will stay significant?

@pradyunsg It's not entirely defined, but if we do it like we've done the other ones, we'll just drop support, update the python_requires in the setup.py and be done with it. That means that on pip<9 it will just install it and possibly fail at runtime from something incompatible and on pip>=9 it will ignore it when looking at PyPI and will fail if you attempt to install it anyways.

We could add a install time check in setup.py if we so felt inclined to do so as well.

We could add a install time check in setup.py if we so felt inclined to do so as well.

Sounds good.

It looks to me like it'll be another 6 months+ till the amount of requests from Python < 2.7.9 will stay significant?

We're now 6 months on, how are the numbers looking now?

Running the query @dstufft posted above (#4350 (comment)):

Month Downloads % of >=2.7,<2.7.9
2018-01 17.8
2017-12 17.6
2017-11 18.5
2017-10 19.7
2017-09 21.3
2017-08 22.8
2017-07 24.1
2017-06 24.0
2017-05 26.1
2017-04 29.1
2017-03 29.5

It's receded slower than I'd anticipated.

Thanks for running the query. Here's the same numbers with deltas, combined with the earlier numbers from #4350 (comment):

Month Downloads % of >=2.7,<2.7.9 Delta
2018-01 17.8 0.2%
2017-12 17.6 -0.9%
2017-11 18.5 -1.2%
2017-10 19.7 -1.6%
2017-09 21.3 -1.5%
2017-08 22.8 -1.3%
2017-07 24.1 0.1%
2017-06 24 -2.1%
2017-05 26.1 -3.0%
2017-04 29.1 -0.4%
2017-03 29.5 -3.4%
2017-02 32.9 -3.2%
2017-01 36.1 -0.9%
2016-12 37 -0.2%
2016-11 37.2 -1.6%
2016-10 38.8 -3.6%
2016-09 42.4 -3.3%
2016-08 45.7 -0.5%
2016-07 46.2 -1.5%
2016-06 47.7  

And charted:

image

And with a trendline:

image

Thanks @hugovk! ^.^

Looking at the general trend, I think we should come back to this in ~4/5 months from now. I propose 14 June 2018. :P

@dstufft @di @ewdurbin Did pip on Python < 2.7.9 break due to the removal of TLS 1.0/1.1 support from PyPI?

(sorry if I'm too noisy)

5j9 commented

Did pip on Python < 2.7.9 break due to the removal of TLS 1.0/1.1 support from PyPI?

I know that AppVeyor jobs for Python < 2.7.9 (also 3.4.0 but not 3.4.1+), installed using msi Windows installers, all started to fail because of the TLS issue and pip not being able to communicate with PyPI. I don't know about Linux or source compiled installations though.

Today's numbers look like:

Python Version Download Count Percent
>=2.7.9 258067087 54.76%
3.6 89836948 19.06%
<2.7.9 68652338 14.57%
3.5 36294439 7.70%
3.4 14786255 3.14%
2.6 2940512 0.62%
3.7 436672 0.09%
3.3 193578 0.04%
Month Downloads % of >=2.7,<2.7.9 Delta
2018-05 14.0 -1.1%
2018-04 15.1 -0.1%
2018-03 15.2 -0.7%
2018-02 15.9 -1.8%
2018-01 17.7 0.1%
2017-12 17.6 -0.9%
2017-11 18.5 -1.2%
2017-10 19.7 -1.6%
2017-09 21.3 -1.5%
2017-08 22.8 -1.3%
2017-07 24.1 0.1%
2017-06 24 -2.1%
2017-05 26.1 -3.0%
2017-04 29.1 -0.4%
2017-03 29.5 -3.4%
2017-02 32.9 -3.2%
2017-01 36.1 -0.9%
2016-12 37 -0.2%
2016-11 37.2 -1.6%
2016-10 38.8 -3.6%
2016-09 42.4 -3.3%
2016-08 45.7 -0.5%
2016-07 46.2 -1.5%
2016-06 47.7  

Charted:

image

With trendline:

image

That linear trendline is misleading; it will be an exponential with a tail that will never go to 0 (until it's forced to)

Yeah, it does demonstrate that the decline slowing.

Running this BigQuery again, for the past 6 months:

yyyymm percent_lt279 download_count
2019-10 3.5 1919909312
2019-09 3.7 3214786202
2019-08 4.2 3076675931
2019-07 4.5 3033745291
2019-06 5.3 2729569347
2019-05 5.9 2757051716
2019-04 6.2 828207478

It's less than 5% now, so I'm happy to completely drop support in our next release.

@pradyunsg general follow up to what I was on about way earlier: is this about removing TLS 1.0 support or explicitly checking and rejecting Python <2.7.9? If the former, can you query against the TLS version used rather than the Python version reported?

Charted, this time with a polynomial trendline:

image

@nicktimko I think the biggest thing is just simply reducing the surface area of support we have. Generally we decide this on an X.Y basis, because that draws the cleanest lines around major changes. However, due to 2.7's age, 2.7.9 is a bit of a special case in that it introduced a backported ssl module from Python 3.

This work would effectively allow us to better take advantage on the capabilities of that back ported SSL module, for instance we can use the configuration settings so that pip itself will not function with older versions of TLS, we can possibly start relying on the platform trust stores in more situations, etc.

But really the biggest thing is narrowing the supported configurations.

To @pradyunsg I think sub 5% is a perfectly fine point to drop support for older versions of 2.7.

Filed #7362, where we can discuss the how we'd do the removal. If someone wants to discuss if, when or why, please continue to do so in this issue.

For some reason GitHub gives me a 500 error on #7362, so I can't see the close reason.

Anyway, do you think this should be swept into the general Python 2.7.* removals after pip 20.3 is released in October 2020?

#6148 (comment)

If so, close this?