rhasspy/larynx

Package `larynx-tts_0.5.0_amd64.deb` installs but fails to run older systems

follower opened this issue · 6 comments

Problem

The package larynx-tts_0.5.0_amd64.deb installs on Elementary OS 5.1 (which is based on Ubuntu 18.04 LTS which is based on Debian ~buster/sid*) but the supplied python3 binary/larynx script fails to run due to an issue related to libc versioning.

$ larynx --help
python3: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by python3)

Workaround

I'd recently encountered this issue with another project so was able to work around the issue in the interim by extracting a package with a later version of libc and helping things find what they were looking for. waves hands here

Cause

Anyway, as far as I'm aware, this issue occurs because the Larynx package is built on a machine with a more recent libc version than the one installed locally.

Which I think is confirmed by this line in the docker config:

FROM debian:buster-slim as python37

Options for resolving issue

In terms of "resolving" the issue:

  • Ideally the package could be built on an older base system docker image so older machines could still run it successfully. (As I understand it, I think the only libc version changes are related to some optimisations but I don't know if they impact Larynx's performance.)
  • Alternatively the package could be configured with version information that would prevent installation on older, incompatible systems, unless manually overridden.

I'll admit I didn't really expect the Larynx package to ship its own Python binary instead of depending on system packages but I assume that's to ensure compatibility with compiled extensions?

Appreciation

Despite this issue I was able to get up and running with Larynx after applying the workaround and overall am very happy with the initial resulting output.

Thanks for all the work you've put into the project, I'm really excited about the potential that high quality, free & open source offline text to speech technology brings with it!

Thanks!

Hi @follower, thanks for the detailed feedback!

I had issues getting updates for some of the older releases at one point, so I switched to buster. Ultimately, the problem comes down to the onnxruntime dependency. It looks like bullseye finally has a python3-onnx package, so I can depend on that going forward at least. For now, though, I need the compiled extension which is always bound to a particular version of Python.

To make things harder, the official onnxruntime wheels don't support 32-bit ARM. So even if I could do a pip install during the Debian package installation, it won't work unless I maintain builds for all Python versions (3.6-3.9+). I had to build my version for Python 3.7 on an actual Pi, and it took a day or two!

If you have any suggestions for getting around these problems, I'd love to hear them. These same issues with compiled Python extensions crop up in most of my projects, so any help would be much appreciated 🙂

Thanks for taking the time to read & reply. :)

I'm back with an update after further research & testing, first up...


TL;DR:

On Ubuntu 18.04-LTS derived systems it seems that it is sufficient to first install Python 3.7[0] (e.g. via apt with):

sudo apt install python3.7

And then (with the Larynx .deb installed), it should now be possible to run Larynx successfully with:

PYTHONPATH=/usr/lib/larynx-tts:/usr/lib/larynx-tts/usr/local/lib/python3.7/site-packages/ python3.7 -m larynx -v en "Hello."

Downsides to this workaround

Unfortunately it's still not possible to run /usr/bin/larynx directly as it (intentionally) alters the value of PATH so that /usr/lib/larynx-tts/usr/local/bin/python3.7 is found & used before any other installed version.

The /usr/lib/larynx-tts is required in PYTHONPATH in order for the larynx module to be found & the /usr/lib/larynx-tts/usr/local/lib/python3.7/site-packages/ is required so that gruut & other packages are found.

Possible .deb changes for a fix

This suggests to me that perhaps the .deb could depend on a system python3.7 and the precompiled library binaries will still be compatible.

If the pre-compiled python3.7 binary was removed from the .deb then I think the larynx/larynx-server scripts might be able to be used unchanged except for replacing python3 with python3.7 on the last line. (Although maybe we'd still have to handle /usr/lib/larynx-tts/usr/local/lib/python3.7/site-packages/ specifically--not sure whether it'll just get found automatically as a result of the other changes to various path configs in the script.)

So, while a bit verbose, and not ideal, this is a straight forward enough workaround to get things working for me.

[0] For me, currently apt show python3.7 now displays:

$ apt show python3.7
Package: python3.7
Version: 3.7.5-2~18.04.4
Priority: optional
Section: universe/python
Origin: Ubuntu
[...]

Will follow-up with another comment providing a bit more background...

[Other than for someone with an idle interest in this issue the following probably isn't particularly necessary to read/write but, what can I say, I'm a completionist. :D ]

The underlying problem

As reported originally, the error message displayed is:

python3: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by python3)

While there's a number of tools we could use to help investigate this further I tend toward readelf these days, so when using that to look for symbols that mention GLIBC_2.28 we get:

$ readelf --all /usr/lib/larynx-tts/usr/local/bin/python3 | grep "2\.28"
0000002a95d8  00bd00000007 R_X86_64_JUMP_SLO 0000000000000000 fcntl64@GLIBC_2.28 + 0
   189: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fcntl64@GLIBC_2.28 (15)
  7672: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fcntl64@@GLIBC_2.28
  0bc:   2 (GLIBC_2.2.5)   f (GLIBC_2.28)   10 (GLIBC_2.6)     2 (GLIBC_2.2.5)
  0x00d0:   Name: GLIBC_2.28  Flags: none  Version: 15

So, in this case it seems there's only one symbol affected: fcntl64 (which is a plus, because when I've previously encountered this issue there were math-related symbols affected too).

What even is a fcntl64 or a fcntl?

Why, it's a function to "manipulate file descriptor" fcntl64 / fcntl also known as a "kitchen sink". :)

The associated man pages go on to note:

The original Linux fcntl() system call was not designed to handle large file offsets (in the flock structure). Consequently, an fcntl64() system call was added in Linux 2.4. The newer system call employs a different structure for file locking, flock64, and corresponding commands, F_GETLK64, F_SETLK64, and F_SETLKW64. However, these details can be ignored by applications using glibc, whose fcntl() wrapper function transparently employs the more recent system call where it is available. [Emphasis mine.]

However it seems like the concluding comment is in some ways overly optimistic, because (as I understand it) in libc 2.28 the fcntl() function definition was change to be a preprocessor macro that simply defined fcntl to be fcntl64--a change that is apparently not backwardly(?) compatible when the compiled binary is run on older systems.

This is apparently a "known issue" and allegedly intentional.

Whatever can we do?

Well, in our case, install a version of python3.7 built on the older system. (See above. :) )

But, were that not an option, apparently it is/may be possible to write a wrapper function that would enable the binary to run on older systems--but with various levels of cautions against reliability/undefined behaviour.

Given that the underlying "issue" originates in a different project (i.e. Python) I decided there probably wasn't much point pursuing wrapper based workaround in this case.

Related links

For completeness, here's some of the references I used/encountered while researching this:

This appears to be the most discussion I've seen about (semi-)related issues on the libc side, and seems to provide useful context about symbol versioning: "Evolution of ELF symbol management"

Addendum

Part of the reason why I've included the link dump above (outside of completionism :) ) is that the more I looked into it, the less convinced I am that this breakage is intentional.

Now, it may be that libc just treats such breakage as "expected" & intentional when such a change occurs and so doesn't explicitly mention it because it should be "obvious" based on project norms--or I might not be reading between the lines to see where such breakage is implied as intentional.

However, there are aspects that stand out:

  1. The code in question appears to have some code that is explicitly related to compatibility.
  2. There is no explicit discussion/notification about the impact of compiling with e.g. code that uses fcntl() silently using fcntl64() instead & thus not working on older systems.
    • The commit in question mentions "for architectures which defines __USE_FILE_OFFSET64, fcntl64 will aliased to fcntl and no adjustment would be required.", "A new LFS fcntl64 is added on default ABI [maybe this is the "implying compatibility issues" part] with the usual macros to select it for FILE_OFFSET_BITS=64." and "Keep a compat symbol with old broken semantic". It also mentions "The idea follows other LFS interfaces that provide two symbols".

    • This is also the case in the "News" file for the 2.28 release which mentions:

      The fcntl function now have a Long File Support variant named fcntl64. It is added to fix some Linux Open File Description (OFD) locks usage on non LFS mode. As for others *64 functions, fcntl64 semantics are analogous with fcntl and LFS support is handled transparently.

      But in other sections goes into details about other functions being deprecated/removed etc.

    • It should also be noted that the fcntl64 functionality isn't new, just that it was transparently invoked via fcntl previously--AIUI.

  3. There seems to be no related bug in https://sourceware.org/bugzilla/query.cgi despite multiple complaints about the changes in a number of places online.

Anyway, I'm curious about what the reality of the situation is and--given I've run into similar issues twice in the past couple of months--I suspect this is probably not the last time I'll encounter it. So hopefully these notes will prove helpful if that happens.

[Feel free to close this issue when we've handled the Larynx-specific aspect to your satisfaction.]

Edited to add:

For (further :D ) completeness here is the workaround referred to in the original issue comment:

  1. Download e.g. libc6_2.31-0ubuntu9.2_amd64.deb from http://archive.ubuntu.com/ubuntu/pool/main/g/glibc/.
  2. Extract (not install) the contents of the package into a new sub-directory with, e.g.:
    dpkg-deb --vextract libc6_2.31-0ubuntu9.2_amd64.deb try__libc6_2_31/
  3. (Larynx-specific step.) Configure the environment with:
    source /usr/bin/larynx
    (Ignore the version GLIBC_2.28 not found error).
  4. Hopefully, running this will now succeed:
    ./try__libc6_2_31/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 --library-path ./try__libc6_2_31/lib/x86_64-linux-gnu /usr/lib/larynx-tts/usr/local/bin/python3 -m larynx --voice en "Hello! I lib c you!"

What you're doing (AIUI) is executing the later version of the dynamic linker/loader (a.k.a. interpreter) directly (rather than the path stored inside the binary) and telling it (a) where to find the library files it expects to find and (b) what the name of the executable it should run is.

I assume there's reasons why this won't work in all cases but it's worked for both of the ones I've run into recently.

Related links

As far as I can tell none of the easily found "answers" to the question of how to handle this style of error suggest just extracting the files from the .deb (rather than installing them [inadvisable!], compiling from source [slow] or copying arbitrary number of files from another machine [prone to error]), so hopefully this is useful to someone.

My recent Debian packages are built for Debian bullseye without including an internal Python interpreter. Do you think it's worth building packages for older systems, or is from source is easy enough?

Somehow side-related but:

  • larynx -h
    ModuleNotFoundError: No module named 'gruut'

  • PYTHON_PATH=/usr/lib/larynx-tts/lib/python3.9/site-packages larynx -h
    ModuleNotFoundError: No module named 'regex'

  • PYTHONPATH=/usr/lib/larynx-tts python3 -m larynx -h
    ModuleNotFoundError: No module named 'onnxruntime'

  • PYTHONPATH=/usr/lib/larynx-tts:/usr/lib/larynx-tts/lib/python3.9/site-packages python3 -m larynx -h
    ModuleNotFoundError: No module named 'pycrfsuite._pycrfsuite'

  • PYTHON_PATH=$HOME/.local/lib/python3.8/site-packages:/usr/lib/larynx-tts/lib/python3.9/site-packages:/usr/lib/larynx-tts/lib/python3.9/site-packages:/usr/lib/python3/dist-packages larynx -h
    ModuleNotFoundError: No module named 'onnxruntime.capi.onnxruntime_pybind11_state'

With a package (larynx-tts 1.1.0 on Ubuntu 20.04 and Python 3.8) I would expect the installation to be more... fluent.