besser82/libxcrypt

Replace awk and shell scripts with Python or Perl?

Closed this issue · 17 comments

zackw commented

Quite some time ago I rewrote our large and unwieldy collection of build-time shell and awk scripts in Python 3: 7a2ce7d (Then I got a little overambitious and tried to get rid of libtool as well, and then the day job intervened and I haven't touched the code in months.)

I would like to dust this patch off and merge it, but it has occurred to me that it might make life easier for our downstream redistributors if the language used was Perl instead of Python. Automake is written in Perl, so that interpreter is already part of the bootstrap set, where Python might not be.

I'm personally more fluent in Python, but I'd be fine with either, all told.

@solardiz @ldv-alt @fweimer @rfc1036 I know you've all worked on redistribution issues in the past, what do you think?

(N.B. The existing Python script ka-table-gen.py would not be rewritten in Perl, because as far as I know there is no Perl equivalent of passlib. But that script is not run during a normal build, so it doesn't matter.)

Thank you for your concern. In Debian/Ubuntu a significant subset of Perl is an Essential package, hence due to its pervasivity in the distribution it will be around since the early bootstrap of a port. Python would probably require a long period of cross-compilation of the package.

@zackw In Owl, we chose not to have Python in the distro at all. This took some effort to untie a few components from Python (e.g., RPM), but luckily there were not too many and the ties were not too tight. Since then, I hear recent glibc requires Python for build? I don't know what we'd have done if we wanted to revitalize and update Owl. I guess needing to bite the Python bullet would be yet another reason not to continue with the project. Anyway, now that's history.

For libxcrypt, if it's only to be built on systems with glibc, I guess Python might be acceptable if glibc has a similar dependency. A question might be which version of Python. You appear to require 3.6+ already? Some still maintained LTS distros might not have 3.6 except in an extra packages repository. glibc doesn't appear to require 3.6+ yet: https://sourceware.org/glibc/wiki/Style_and_Conventions#Python_usage_conventions "Require python-2.7, but be compatible with python-3.4+ (scripts only used by glibc developers may require python 3)".

Should libxcrypt be usable on systems without glibc, though? Perhaps yes. And perhaps that's a reason against Python then.

Do the awk and shell scripts really need replacing, and why? I'd say, don't fix what's not broken. Sure I'd prefer Perl over Python, but maybe there's no need for the rewrite in the first place?

The existing Python script ka-table-gen.py would not be rewritten in Perl, because as far as I know there is no Perl equivalent of passlib. But that script is not run during a normal build, so it doesn't matter.)

That's fine. As to passlib Perl equivalent, there's https://metacpan.org/pod/Authen::Passphrase

I don't have an opinion here. All distributions I have worked on have both Python and Perl, and do not have any issues building packages with either.

zackw commented

I never answered this question:

Do the awk and shell scripts really need replacing, and why?

They are functional as is, but it is difficult to modify them, and they're at about the limit for what I consider practical to achieve in sh+awk. For instance, the crypt.conf work stalled out partially because I wanted to machine-generate the parser from tables of valid values for each parameter for each hash, but extending hashes.conf with more fields is just too difficult as long as each consumer of hashes.conf is its own self-contained awk script with its own quirks. In the pull request I'm about to post, which switches all the scripts over to Perl, there is a single unified parser for hashes.conf, in a library file, so it will be much easier to extend or even completely change the format.

Newer scripting languages also come with nice tools that do things like catch common errors for you, making future code review easier.

zackw commented

Addressed via #118.

This is an unfortunate change, since perl (or at least some library part of perl) links to libcrypt.so. Is it possible to revert?

Or if reverting is not possible, can you generate things ahead of time for the tarball, or keep generated files in source control?

Note that Python until 3.12 (iirc) also depends on libcrypt.

Note that Python until 3.12 (iirc) also depends on libcrypt.

What changes in 3.12? I didn't know they were dropping the dep.

Or if reverting is not possible, can you generate things ahead of time for the tarball, or keep generated files in source control?

AFAIK the release tarballs cover this now.

The configure script contains

# This code must run after $PERL is set.
hashes_enabled=$(
    $PERL "$srcdir"/build-aux/scripts/expand-selected-hashes \
          "$srcdir"/lib/hashes.conf \
          "$hashes_selected"
)
if test x"$hashes_enabled" = x || test x"$hashes_enabled" = x,; then
    as_fn_error $? "bad value '${hashes_selected}' for --enable-hashes" "$LINENO" 5
fi

and I don't see a way to avoid it

This is an unfortunate change, since perl (or at least some library part of perl) links to libcrypt.so. Is it possible to revert?

I'm sorry, but I don't understand your problem fully. In any case of building libxcrypt, there is no broken dependency cycle, as all you need, besides a libc and a (cross-)compiler for $TARGET, is a basic perl >= 5.14 installation on $HOST, which is mandatory to be present on any system conforming to LSB, Posix, and/or SysV.

Can you give us some more information about your problem, please?

The problem is I don't see how this is supposed to be bootstrapped from sources. Perl needs libcrypt, libcrypt needs Perl. Sure, there are binaries for Perl in minimal system images, but they already depend on libcrypt:

$ docker run ubuntu:22.04 ldd /usr/bin/perl | grep libcrypt
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f5542748000)

How do you bootstrap from sources?

The problem is I don't see how this is supposed to be bootstrapped from sources. Perl needs libcrypt, libcrypt needs Perl. Sure, there are binaries for Perl in minimal system images, but they already depend on libcrypt:

libcrypt doesn't need perl as a run-time dependency; perl is needed during build-time to generate / configure some internally used header-files. The resulting libcrypt.so is fully self-contained, and doesn't link any objects from perl.

$ docker run ubuntu:22.04 ldd /usr/bin/perl | grep libcrypt
	libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f5542748000)

The libcrypt in this example is already libxcrypt, as Ubuntu 22.04 ships libxcrypt-4.4.27.

How do you bootstrap from sources?

The cycle is libc (e.g. glibc, using python >= 3.4 available on $HOST during build-time), libxcrypt (using perl >= 5.14 available on $HOST during build-time), all stuff that needs libcrypt.so to be present.

zackw commented

I believe (based on source inspection) that if you build perl on a system where libcrypt is unavailable, the "Configure" script will figure that out for itself and disable the features that depend on it. You'll still get a usable perl executable, it just won't provide a wrapper for crypt(). You can then use that perl to build libcrypt, and then rebuild perl with libcrypt.

Debian has done a whole bunch of work on figuring out the best way to deal with this and the several other dependency cycles near the beginning of a bootstrap, see https://wiki.debian.org/DebianBootstrap.

Thanks for the comments. Let me just expand a bit about the context:

In Spack we'd like users to build different flavors and versions of packages independently from their distro, kinda like Nix, but more source-build oriented. In a consistent dependency graph we typically allow only one flavor of a package. As such we want to avoid a chain like perl crypt=yes depends on libxcrypt depends on perl crypt=no (two flavors of the same package in a graph, although we could make this happen in the future), and we also want to avoid marking libxcrypt as an external system dependency (since then it's not at its latest version, as seen with Ubuntu 22.04 above), and we also want to avoid using system Perl (sometimes it's not installed or usable 1, as a user reports, and there can be instances where the user does not have access to run the system package manager). The most natural solution in my opinion would be to have libxcrypt drop the build dependency on Perl, since then the dependency graph is simply perl crypt=yes depends on libxcrypt. It would also make life easier for people bootstrapping a Linux distro in general, since it removes a step of building yet another Perl.

Footnotes

  1. Notice that distros like centos/rocky 8 package packages individually, so you may have a perl binary but not open.pm, which requires yum install perl-open 🤷‍♂️.

@haampie: See: spack/spack#33907 (comment) for some explaination about RHEL-based distros; usually the sysadmin isn't opposed to have that fixed.