sagemath/sage

After #30053, sphinx 3.1.2 does not build on ubuntu-{trusty,xenial,bionic}, debian-jessie, centos-7 (again)

Closed this issue · 58 comments

kliem commented

sage-system-python is the Python that we use to download and unpack upstream archives. We allow a very wide range of Python versions.

After #30053 undid the change in #29033, we are again running into locale / encoding problems with early (pre-3.7) versions of Python used as sage-system-python.

This is a blocker ticket for Sage 9.2 because it is a severe regression in platform support compared to Sage 9.1.

Originally reported on this ticket:

/sage/build/bin/sage-spkg: line 73: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8): No such file or directory
/sage/build/bin/sage-spkg: line 73: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
Found local metadata for sphinx-3.0.4.p0
bash: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
Attempting to download package Sphinx-3.0.4.tar.gz from mirrors
http://mirrors.xmission.com/sage/spkg/upstream/sphinx/Sphinx-3.0.4.tar.gz
[......................................................................]
sphinx-3.0.4.p0
====================================================
Setting up build directory for sphinx-3.0.4.p0
bash: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8)
Traceback (most recent call last):
  File "/sage/build/bin/sage-uncompress-spkg", line 23, in <module>
    run()
  File "/sage/build/bin/../sage_bootstrap/uncompress/cmdline.py", line 72, in run
    unpack_archive(archive, dirname)
  File "/sage/build/bin/../sage_bootstrap/uncompress/action.py", line 68, in unpack_archive
    archive.extractall(members=archive.names)
  File "/sage/build/bin/../sage_bootstrap/uncompress/tar_file.py", line 96, in extractall
    **kwargs)
  File "/usr/lib64/python3.6/tarfile.py", line 2010, in extractall
    numeric_owner=numeric_owner)
  File "/usr/lib64/python3.6/tarfile.py", line 2052, in extract
    numeric_owner=numeric_owner)
  File "/sage/build/bin/../sage_bootstrap/uncompress/tar_file.py", line 122, in _extract_member
    **kwargs)
  File "/usr/lib64/python3.6/tarfile.py", line 2122, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/usr/lib64/python3.6/tarfile.py", line 2163, in makefile
    with bltn_open(targetpath, "wb") as target:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 45: ordinal not in range(128)
************************************************************************
Error: failed to extract /sage/upstream/Sphinx-3.0.4.tar.gz
************************************************************************
Please email sage-devel (http://groups.google.com/group/sage-devel)
explaining the problem and including the log file
  /sage/logs/pkgs/sphinx-3.0.4.p0.log
Describe your computer, operating system, etc.
************************************************************************

See e.g. https://github.com/sagemath/sage/runs/811777719?check_suite_focus=true

But now (https://github.com/mkoeppe/sage/runs/1141520147) we see it for ubuntu-trusty-minimal (where sage-system-python is python 3.4).

We fix this problem by making locale fixes in the sage-system-python script. See e.g. https://github.com/mkoeppe/sage/runs/1142310012 for a run with the present ticket.

Depends on #30053

CC: @vbraun @dimpase @orlitzky @jhpalmieri @seblabbe

Component: packages: standard

Author: Matthias Koeppe

Branch/Commit: ff0dbc6

Reviewer: Jonathan Kliem, Sébastien Labbé

Issue created by migration from https://trac.sagemath.org/ticket/30008

comment:1

That's a very strange error

comment:4

Perhaps #30001 happens to fix it?

kliem commented
comment:6

No, doesn't help. At least not as it is now.

kliem commented
comment:7

According to this

https://www.noreplied.com/how-to-fixed-cannot-change-locale-utf-8-error-in-centos-7/

one should add LC_CTYPE="en_US.UTF-8" to /etc/environment. Very unsatisfying fix for someone without root privileges.

This appears to be a bug with centos 7, which is merely exposed now.

comment:8

Just exporting this environment variable setting in build/bin/sage-spkg should be enough. No need for a systemwide setting

kliem commented
comment:9

Ok. Testing this now. https://github.com/kliem/sage/pull/19/checks

comment:10

apparently should be fixed by #30053

kliem commented
comment:11

It appears that things fixed itself: https://github.com/sagemath/sage/actions/runs/166309975

comment:12

In any case, let's take care of all locale issues in #30053.

Reviewer: Matthias Koeppe

slel commented
comment:14

Also the upgrade to Sphinx 3.1.2 is complete at #30001 and was merged in Sage 9.2.beta5.

Dependencies: #30053

Description changed:

--- 
+++ 
@@ -1,3 +1,8 @@
+`sage-system-python` is the Python that we use to download and unpack upstream archives.  We allow a very wide range of Python versions.
+
+After #30053 undid the change in #29033, we are again running into locale / encoding problems with early (pre-3.7) versions of Python used as `sage-system-python`.
+
+Originally reported on this ticket:
 
 ```
 /sage/build/bin/sage-spkg: line 73: warning: setlocale: LC_ALL: cannot change locale (C.UTF-8): No such file or directory
@@ -42,3 +47,11 @@
 ```
 
 See e.g. https://github.com/sagemath/sage/runs/811777719?check_suite_focus=true
+
+But now (https://github.com/mkoeppe/sage/runs/1141520147) we see it for `ubuntu-trusty-minimal` (where `sage-system-python` is python 3.4).
+
+We fix this problem by making locale fixes in the `sage-system-python` script.
+
+
+
+

Commit: 9403629

Changed reviewer from Matthias Koeppe to https://github.com/mkoeppe/sage/actions/runs/264540655

Author: Matthias Koeppe

New commits:

9403629build/bin/sage-system-python: Work around LC_ALL=C
comment:21

On Ubuntu 18.04 + beta12 + #30053 (+ #30606, the other branch I am working on now), I do not confirm this issue for building sphinx: sage -f sphinx works correctly. version 3.1.2.p0 to be more precise.

comment:22

the error in the ticket description comes from Python 3.6. Is this outdated, in view of #30053, as the locale is staying C locale?

What is the commit in comment:20 doing? I am lost.

comment:23

Replying to @dimpase:

the error in the ticket description comes from Python 3.6. Is this outdated, in view of #30053, as the locale is staying C locale?

build/bin/sage-system-python finds the Python that we use for downloading and unpacking packages.

It is unrelated to the python that we try to find in build/pkgs/python3/spkg-configure.m4 (which rejects Python 3.6 after #30053).

build/bin/sage-system-python can use any of the Pythons listed in build/tox.ini. In particular, it can be an early 3.x series Python. For these, LC_ALL=C disables UTF-8 operation, which causes the errors described on this ticket.

Look at the GH Actions links in the ticket description to see it in action.

Description changed:

--- 
+++ 
@@ -50,8 +50,7 @@
 
 But now (https://github.com/mkoeppe/sage/runs/1141520147) we see it for `ubuntu-trusty-minimal` (where `sage-system-python` is python 3.4).
 
-We fix this problem by making locale fixes in the `sage-system-python` script.
+We fix this problem by making locale fixes in the `sage-system-python` script. See e.g. https://github.com/mkoeppe/sage/runs/1142310012 for a run with the present ticket.
 
 
 
-
comment:25

Ready for review

comment:26

Replying to @seblabbe:

On Ubuntu 18.04 + beta12 + #30053 (+ #30606, the other branch I am working on now), I do not confirm this issue for building sphinx: sage -f sphinx works correctly. version 3.1.2.p0 to be more precise.

You probably have a better version of python in the front of the path than what is the default on Ubuntu 18.04. You can check this with build/bin/sage-system-python --version.

comment:27

Why again are we trying to extract a tarball with python?

The problem here isn't really the C locale, it's that Sphinx ships files with non-ascii characters in them...

$ ls tests/roots/test-images/ | tail -n 1
-rw-r--r-- 1 mjo mjo  65K 2020-08-13 11:46 testimäge.png

and we don't specify an encoding, which ultimately leaves it up to the locale.

Ignoring the fact that reimplementing this is absurd, we should be able to set the filename encoding to utf8 here without touching the locale at all.

comment:28

Replying to @orlitzky:

Why again are we trying to extract a tarball with python?

This question is not within the scope of this ticket

comment:29

Replying to @orlitzky:

The problem here isn't really the C locale

Yes, it is specifically the problem with setting the C locale explicitly. This is explained well in https://www.python.org/dev/peps/pep-0538/

comment:30

Replying to @orlitzky:

we should be able to set the filename encoding to utf8 here without touching the locale at all.

We are using the tarfile module as supplied by the system python.

Description changed:

--- 
+++ 
@@ -1,6 +1,8 @@
 `sage-system-python` is the Python that we use to download and unpack upstream archives.  We allow a very wide range of Python versions.
 
 After #30053 undid the change in #29033, we are again running into locale / encoding problems with early (pre-3.7) versions of Python used as `sage-system-python`.
+
+This is a blocker ticket for Sage 9.2 because it is a severe regression in platform support compared to Sage 9.1.
 
 Originally reported on this ticket:
 
comment:33

I'm not willing to install a version of python from 2014 to test it, but setting tarfile.ENCODING = 'utf-8' probably solves this in a much better way, since we know what the encoding is.

comment:34

Replying to @orlitzky:

I'm not willing to install a version of python from 2014 to test it

You don't have to. We have the GH Actions to test it. The link with the runs and results is in the "Reviewer field".

comment:35

Replying to @orlitzky:

setting tarfile.ENCODING = 'utf-8' probably solves this in a much better way, since we know what the encoding is.

No, that is absolutely not the correct fix because the default (obtained using sys.getfilesystemencoding() or sys.getdefaultencoding()) is correct. See https://docs.python.org/2.6/library/tarfile.html#tar-unicode (Python 2.6 is the earliest version that we support for sage-system-python).

The specific problem is that if someone puts a LC_ALL=C, the default encodings are set to ASCII. This is what causes the breakage, and this is what the current ticket fixes.

Needs review.

comment:36

Setting the locale to C causes the default guess to be wrong, but we know the correct encoding. And the utf-8 encoding is available "everywhere," unlike the locales that we're guessing at in your branch. Finally, changing the system locale to affect one tarfile is like setting your house on fire to stop the faucet from leaking.

comment:37

Replying to @orlitzky:

Finally, changing the system locale to affect one tarfile is like [...]

It is very localized, and it undoes the damage done by #30053's sage-spkg doing exactly what you seem to be criticizing here - namely setting the system locale for all its subprocesses.

comment:38

Replying to @orlitzky:

we know the correct encoding.

This is of course a good point. We can just decide that our tarballs have utf-8 pathnames. I will test the approach with setting tarfile.ENCODING.

Nevertheless, removing the C locale setting in sage-system-python would still be good because there could be other parts that are affected by this unsuitable setting.

comment:40

And #30053 just put things back the way they were before we tried to switch to C.UTF-8. We need to set some locale, because otherwise the behavior of standard utilities in spkg-install and spkg-check scripts is unknowable. For example, this branch can choose the en_US.utf8 locale. How do grep ranges work there? According to https://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html,

In the default C locale, the sorting sequence is the native character order; for example, ‘[a-d]’ is equivalent to ‘[abcd]’. In other locales, the sorting sequence is not specified, and ‘[a-d]’ might be equivalent to ‘[abcd]’ or to ‘[aBbCcDd]’, or it might fail to match any character, or the set of characters that it matches might even be erratic. To obtain the traditional interpretation of bracket expressions, you can use the ‘C’ locale by setting the LC_ALL environment variable to the value ‘C’.

Mucking with this is bad luck.

comment:41

Replying to @orlitzky:

And #30053 just put things back the way they were before we tried to switch to C.UTF-8.

Yes, it was a good change for that ticket, which is why I agreed with the change there.

But it affects the Pythons that run below it.
(1) the sage-system-python - which the present ticket addresses
(2) the python3 from which we build our venv -- which is yet to be addressed in #30576 if we want to support python 3.6.

Branch pushed to git repo; I updated commit sha1. New commits:

ff0dbc6build/sage_bootstrap/uncompress/tar_file.py: Fix encoding to utf-8

Changed commit from 9403629 to ff0dbc6

comment:45

Needs review.

comment:46

Replying to @mkoeppe:

You probably have a better version of python in the front of the path than what is the default on Ubuntu 18.04. You can check this with build/bin/sage-system-python --version.

I have:

$ build/bin/sage-system-python --version
Python 2.7.17

I don't know how to properly review this ticket. With ubuntu-bionic 18.04, I do not have problem. So what I can do is to check that with the branch, I still don't have a problem. Should I do some kind of make distclean? For now, I just pulled that branch on top of freshly built beta13, and I am currently running make ptestlong. Will report later.

comment:47

The tests have already run on GH Actions. It is not necessary to run them again to review this ticket.

comment:48

The problem shows on machines where sage-system-python is python 3.x with x < 7.

It's probably not a good idea to try to uninstall python2 to review this ticket, but you could try to shadow it

comment:49

Replying to @mkoeppe:

you could try to shadow it

tell me how to shadow, I will try that tomorrow. How can I make sage-system-python to use:

$ which python3.6
/usr/bin/python3.6

?

comment:50
mkdir /somewhere && ln -s /usr/bin/python3.6 /somewhere/python 
export PATH=/somewhere:$PATH
comment:51

Doesn't the tarball commit alone fix issue without the locale voodoo?

comment:52

It does fix the issue, but it is best not to pass a locale to python 3.6 that we know to break things.

kliem commented
comment:53

Seems to work fine and looks good to me.

kliem commented

Changed reviewer from https://github.com/mkoeppe/sage/actions/runs/265872338, ... to Jonathan Kliem

comment:54

Replying to @mkoeppe:

mkdir /somewhere && ln -s /usr/bin/python3.6 /somewhere/python 
export PATH=/somewhere:$PATH

Indeed, the above shadowing method works:

$ build/bin/sage-system-python --version
Python 3.6.9

I can now reproduce the issue with Ubuntu 18.04 + 9.2.beta13 and I confirm that the current branch fixes it.

Changed reviewer from Jonathan Kliem to Jonathan Kliem, Sébastien Labbé

comment:55

Thanks everyone!