Drop support for Python < 2.7.9
dstufft opened this issue · 34 comments
Older versions of Python 2.7 (prior to 2.7.9) don't have the capability to have a good TLS configuration, and thus it would be great to drop support for them. We're not currently at the point that we can do that, but I wanted to open this issue to both track that we should do that at some point, and also take notes about the current state and how to query in the future.
Results:
Python Version | Download Count | Percent |
---|---|---|
>=2.7.9 | 268456729 | 53% |
<2.7.9 | 155669164 | 31% |
3.5 | 35605564 | 7% |
3.4 | 23114420 | 5% |
2.6 | 13023950 | 3% |
3.6 | 11948118 | 2% |
3.3 | 1278506 | 0.3% |
3.7 | 231670 | 0.05% |
The results can be queried with:
SELECT
CASE
WHEN REGEXP_MATCH(details.python, r"2\.7\.(9|\d\d)") THEN '>=2.7.9'
WHEN REGEXP_MATCH(details.python, r"2\.7\.") THEN '<2.7.9'
ELSE REGEXP_EXTRACT(details.python, r"^([^\.]+\.[^\.]+)")
END AS python_version,
COUNT(*) AS download_count,
FROM
TABLE_DATE_RANGE( [the-psf:pypi.downloads], DATE_ADD(CURRENT_TIMESTAMP(), -31, "day"), DATE_ADD(CURRENT_TIMESTAMP(), -1, "day") )
WHERE
details.installer.name = 'pip'
GROUP BY
python_version,
ORDER BY
download_count DESC
LIMIT
100
Your query isn't totally correct, 2.7.9
itself doesn't match the >=
regex, it matches the second one.
Ah right, I blame doing this early in the morning :)
Ok, updated the query and results in my first post to reflect the real numbers. Still not enough to drop support for it yet, but the numbers look a lot better this way.
Does RedHat/CentOS 7 Python 2.7.5 count in those stats and also as having "bad TLS"? I'm a CentOS noob, but supposedly the RH-packaged Python has some fixes backported, i.e. are you talking about TLS 1.2?
Is your plan about denying pip
based on Python <2.7.9, or a breaking change that earlier versions won't like (TLS 1.2)?
$ cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)
$ python -V
Python 2.7.5
$ python -c "import json, urllib2; print json.load(urllib2.urlopen(\
'https://www.howsmyssl.com/a/check'))['tls_version']"
TLS 1.2
@nickstenning Those statistics count whatever is returned by platform.python_version()
on that Python, so presumably the RHEL/CentOS 2.7.5 is showing up as 2.7.5 and thus is <2.7.9. I don't know what patches they've applied.
To be clear, this is not a short term issue. It is a place holder mostly to track how >=2.7.9 adoption is going to make a decision about when it is the right time to drop support. That is unlikely to be before it gets into single digit usage.
True, I guess I'm just alluding to how RHEL versions are super-clingy (10-year support, so 7, which has 2.7.5, might be around in reasonable numbers until 2024). That might keep the "<2.7.9" number artificially high, while not truly being reflective of the users you'd disrupt.
is there a reasonable/reliable way to identify the actual distribution of linux that's being used?
Here are today's numbers:
Python Version | Download Count | Percent | Delta |
---|---|---|---|
>=2.7.9 | 293349226 | 54% | +1% |
<2.7.9 | 141872623 | 26% | -5% |
3.5 | 46714212 | 9% | +2% |
3.4 | 23322504 | 4% | +1% |
3.6 | 22756263 | 4% | +2% |
2.6 | 14454324 | 3% | +0% |
3.3 | 1321803 | 0.2% | -0.1% |
3.7 | 285099 | 0.05% | +0% |
Please let me know if I'm way off-base and looking at the wrong thing... To examine that <2.7.9 group a bit:
Python Version | Distro | Downloads | % | urllib2 uses TLS 1.2 |
---|---|---|---|---|
2.7.6 | Ubuntu 14.04 | 54809493 | 10.07% | yes* |
2.7.6 | null null | 37728581 | 6.94% | ???† |
2.7.5 | CentOS Linux 7 | 17152820 | 3.15% | yes* |
2.7.3 | Ubuntu 12.04 | 6890300 | 1.27% | yes* |
2.7.5 | CentOS Linux 7.3.1611 | 3596891 | 0.66% | yes |
2.7.5 | CentOS Linux 7.2.1511 | 2534648 | 0.47% | yes |
2.7.3 | null null | 2504846 | 0.46% | ???† |
2.7.6 | Ubuntu 12.04 | 2355638 | 0.43% | yes* |
2.7.5 | null null | 2329032 | 0.43% | ???† |
2.7.5 | RHEL Server 7.3 | 2223663 | 0.41% | yes |
2.7.6 | debian jessie/sid | 841274 | 0.15% | ?‡ |
2.7.3 | debian 7.11 | 748925 | 0.14% | ?‡ |
* depends on patch level
† this could plausibly be Ubuntu 14.04/16.04/CentOS 7?
‡ don't have any Debian systems
Query (full results):
SELECT
details.python as python_version,
details.distro.name as distro_name,
details.distro.version as distro_version,
COUNT(*) AS download_count,
FROM
TABLE_DATE_RANGE( [the-psf:pypi.downloads], DATE_ADD(CURRENT_TIMESTAMP(), -31, "day"), DATE_ADD(CURRENT_TIMESTAMP(), -1, "day") )
WHERE
details.installer.name = 'pip'
AND REGEXP_MATCH(details.python, r'2\.7\.[0-8]($|[^\d])')
GROUP BY
python_version, distro_name, distro_version
ORDER BY
download_count DESC
LIMIT
100
"urllib2
uses TLS 1.2" checked by the output of python -c "import json, urllib2; print json.load(urllib2.urlopen('https://www.howsmyssl.com/a/check'))['tls_version']"
- Ubuntu 12.04.5 LTS / Python 2.7.3 (default, Oct 26 2016, 21:01:49)
- Ubuntu 14.04.5 LTS / Python 2.7.6 (default, Oct 26 2016, 20:30:19)
- Ubuntu 14.04.5 LTS / Python 2.7.6 (default, Jun 22 2015, 17:58:13)
- CentOS Linux release 7.2.1511 / Python 2.7.5 (default, Aug 18 2016, 15:58:25)
- CentOS Linux release 7.3.1611 / Python 2.7.5 (default, Nov 6 2016, 00:28:07)
From a thread on Python-Dev and it pointing to https://github.com/ouspg/trytls/tree/shootout-0.2/shootout I'm not sure if there's a problem with the functionality (insecure as it may be, but then older TLS are insecure anyways?) of TLS in older distros/Python 2.7.x's, or if it would actually break things.
Oh, looks like the TLS protocol is logged to the DB:
Python | TLSv1 | TLSv1.1 | TLSv1.2 | Totals |
---|---|---|---|---|
2.6 | 0.19% | 0.01% | 2.45% | 2.66% |
<2.7.9 | 0.27% | 0.25% | 25.55% | 26.07% |
≥2.7.9 | 2.66% | 0.31% | 50.93% | 53.90% |
3.3 | 0.00% | 0.03% | 0.21% | 0.24% |
3.4 | 0.04% | 0.11% | 4.14% | 4.28% |
3.5 | 0.06% | 0.15% | 8.37% | 8.58% |
3.6 | 0.00% | 0.09% | 4.09% | 4.18% |
3.7 | 0.01% | 0.04% | 0.05% | |
Totals | 3.22% | 0.96% | 95.79% | 99.96% |
I guess I'm now even more confused as most of the old TLS connections (in number and proportion) are made by Python ≥2.7.9.
@nicktimko I think that is macOS.
I was curious how this compared over time, so here are the percent of downloads using >=2.7,2.7.9
:
Month | Percent of Downloads >=2.7,<2.7.9 | Delta |
---|---|---|
2017-06 | 24.5 | -1.6% |
2017-05 | 26.1 | -3% |
2017-04 | 29.1 | -0.4% |
2017-03 | 29.5 | -3.4% |
2017-02 | 32.9 | -3.2% |
2017-01 | 36.1 | -0.9% |
2016-12 | 37.0 | -0.2% |
2016-11 | 37.2 | -1.6% |
2016-10 | 38.8 | -3.6% |
2016-09 | 42.4 | -3.3% |
2016-08 | 45.7 | -0.5% |
2016-07 | 46.2 | -1% |
2016-06 | 47.7 |
Gotten using this query:
SELECT
STRFTIME_UTC_USEC(timestamp, "%Y-%m") AS yyyymm,
ROUND(100 * SUM(CASE
WHEN REGEXP_MATCH(details.python, r"2\.7\.(9|\d\d)") THEN 0
WHEN REGEXP_MATCH(details.python, r"2\.7\.") THEN 1
ELSE 0 END) / COUNT(*), 1) AS percent_lt279,
COUNT(*) AS download_count
FROM
TABLE_DATE_RANGE(
[the-psf:pypi.downloads],
DATE_ADD(CURRENT_TIMESTAMP(), -1, "year"),
CURRENT_TIMESTAMP()
)
WHERE
details.installer.name = 'pip'
GROUP BY
yyyymm
ORDER BY
yyyymm DESC
LIMIT
100
What's the change that wants to be made that would break <2.7.9?
@nicktimko Remove the need to continue to support emulation for SSLContext objects, which would also free up the ability to start trusting the platform network store on Linux machines and to allow Python to start validating the hostname instead of having to copy that functionality into requests. It also will allow us to start mandating TLSv1.2+ on the client side.
I'm curious as to what dropping this support would look like: will the current warning be changed into a fatal-error message and pip aborting or it'll be allow albeit requiring jumping through hoops to make it possible for the end-user?
I think it should be the former.
I was curious how this compared over time,
It looks to me like it'll be another 6 months+ till the amount of requests from Python < 2.7.9 will stay significant?
@pradyunsg It's not entirely defined, but if we do it like we've done the other ones, we'll just drop support, update the python_requires
in the setup.py
and be done with it. That means that on pip<9
it will just install it and possibly fail at runtime from something incompatible and on pip>=9
it will ignore it when looking at PyPI and will fail if you attempt to install it anyways.
We could add a install time check in setup.py
if we so felt inclined to do so as well.
We could add a install time check in setup.py if we so felt inclined to do so as well.
Sounds good.
It looks to me like it'll be another 6 months+ till the amount of requests from Python < 2.7.9 will stay significant?
We're now 6 months on, how are the numbers looking now?
Running the query @dstufft posted above (#4350 (comment)):
Month | Downloads % of >=2.7,<2.7.9 |
---|---|
2018-01 | 17.8 |
2017-12 | 17.6 |
2017-11 | 18.5 |
2017-10 | 19.7 |
2017-09 | 21.3 |
2017-08 | 22.8 |
2017-07 | 24.1 |
2017-06 | 24.0 |
2017-05 | 26.1 |
2017-04 | 29.1 |
2017-03 | 29.5 |
It's receded slower than I'd anticipated.
Thanks for running the query. Here's the same numbers with deltas, combined with the earlier numbers from #4350 (comment):
Month | Downloads % of >=2.7,<2.7.9 | Delta |
---|---|---|
2018-01 | 17.8 | 0.2% |
2017-12 | 17.6 | -0.9% |
2017-11 | 18.5 | -1.2% |
2017-10 | 19.7 | -1.6% |
2017-09 | 21.3 | -1.5% |
2017-08 | 22.8 | -1.3% |
2017-07 | 24.1 | 0.1% |
2017-06 | 24 | -2.1% |
2017-05 | 26.1 | -3.0% |
2017-04 | 29.1 | -0.4% |
2017-03 | 29.5 | -3.4% |
2017-02 | 32.9 | -3.2% |
2017-01 | 36.1 | -0.9% |
2016-12 | 37 | -0.2% |
2016-11 | 37.2 | -1.6% |
2016-10 | 38.8 | -3.6% |
2016-09 | 42.4 | -3.3% |
2016-08 | 45.7 | -0.5% |
2016-07 | 46.2 | -1.5% |
2016-06 | 47.7 |
And charted:
And with a trendline:
Thanks @hugovk! ^.^
Looking at the general trend, I think we should come back to this in ~4/5 months from now. I propose 14 June 2018. :P
Did pip on Python < 2.7.9 break due to the removal of TLS 1.0/1.1 support from PyPI?
I know that AppVeyor jobs for Python < 2.7.9 (also 3.4.0 but not 3.4.1+), installed using msi Windows installers, all started to fail because of the TLS issue and pip not being able to communicate with PyPI. I don't know about Linux or source compiled installations though.
Today's numbers look like:
Python Version | Download Count | Percent |
---|---|---|
>=2.7.9 | 258067087 | 54.76% |
3.6 | 89836948 | 19.06% |
<2.7.9 | 68652338 | 14.57% |
3.5 | 36294439 | 7.70% |
3.4 | 14786255 | 3.14% |
2.6 | 2940512 | 0.62% |
3.7 | 436672 | 0.09% |
3.3 | 193578 | 0.04% |
Month | Downloads % of >=2.7,<2.7.9 | Delta |
---|---|---|
2018-05 | 14.0 | -1.1% |
2018-04 | 15.1 | -0.1% |
2018-03 | 15.2 | -0.7% |
2018-02 | 15.9 | -1.8% |
2018-01 | 17.7 | 0.1% |
2017-12 | 17.6 | -0.9% |
2017-11 | 18.5 | -1.2% |
2017-10 | 19.7 | -1.6% |
2017-09 | 21.3 | -1.5% |
2017-08 | 22.8 | -1.3% |
2017-07 | 24.1 | 0.1% |
2017-06 | 24 | -2.1% |
2017-05 | 26.1 | -3.0% |
2017-04 | 29.1 | -0.4% |
2017-03 | 29.5 | -3.4% |
2017-02 | 32.9 | -3.2% |
2017-01 | 36.1 | -0.9% |
2016-12 | 37 | -0.2% |
2016-11 | 37.2 | -1.6% |
2016-10 | 38.8 | -3.6% |
2016-09 | 42.4 | -3.3% |
2016-08 | 45.7 | -0.5% |
2016-07 | 46.2 | -1.5% |
2016-06 | 47.7 |
That linear trendline is misleading; it will be an exponential with a tail that will never go to 0 (until it's forced to)
Yeah, it does demonstrate that the decline slowing.
Running this BigQuery again, for the past 6 months:
yyyymm | percent_lt279 | download_count |
---|---|---|
2019-10 | 3.5 | 1919909312 |
2019-09 | 3.7 | 3214786202 |
2019-08 | 4.2 | 3076675931 |
2019-07 | 4.5 | 3033745291 |
2019-06 | 5.3 | 2729569347 |
2019-05 | 5.9 | 2757051716 |
2019-04 | 6.2 | 828207478 |
It's less than 5% now, so I'm happy to completely drop support in our next release.
@pradyunsg general follow up to what I was on about way earlier: is this about removing TLS 1.0 support or explicitly checking and rejecting Python <2.7.9? If the former, can you query against the TLS version used rather than the Python version reported?
@nicktimko I think the biggest thing is just simply reducing the surface area of support we have. Generally we decide this on an X.Y basis, because that draws the cleanest lines around major changes. However, due to 2.7's age, 2.7.9 is a bit of a special case in that it introduced a backported ssl module from Python 3.
This work would effectively allow us to better take advantage on the capabilities of that back ported SSL module, for instance we can use the configuration settings so that pip itself will not function with older versions of TLS, we can possibly start relying on the platform trust stores in more situations, etc.
But really the biggest thing is narrowing the supported configurations.
To @pradyunsg I think sub 5% is a perfectly fine point to drop support for older versions of 2.7.
Filed #7362, where we can discuss the how we'd do the removal. If someone wants to discuss if, when or why, please continue to do so in this issue.