plk/biber

Running biber from within a script

Closed this issue · 19 comments

To create a lot of examples I run tons of files via several scripts. With one of the last updates of biber/biblatex I have a problem with biber on macOS/Ventura (TL2023) running the scripts with bash:

...
### pdflatex 04-02-25.ltx2 (1)
### run command (biber 04-02-25)
error: /Library/Developer/CommandLineTools/usr/bin/lipo: can't move temporary file: /var/folders/ww/7jbdn2090q1bdng_1mplhtqr0000gn/T/par-766f7373/cache-4edecc3c88fd313ee8c9289f29dfd8b198e3e70a/thin/biber to file: /var/folders/ww/7jbdn2090q1bdng_1mplhtqr0000gn/T/par-766f7373/cache-4edecc3c88fd313ee8c9289f29dfd8b198e3e70a/thin/biber.lipo (No such file or directory)
biber: extracting x86_64 binary with lipo failed (wstatus=256)
######################################################################
### E R R O R                                                      ###
######################################################################

In the directory .../thin/ there is only a file biber and not biber.lipo

Running an example in a terminal is no problem ...

Herbert, that is a problem that's recently come up for a few people who use biber concurrently. The problem is that biber self-extracts to a temp directory and macOS's lipo utility separates the binary for the specific architecture from the universal binary. Two processes performing this simultaneously cause the error when lipo cannot write the extracted binary to a file. One can easily reproduce this problem by running biber --help | biber --help. Unfortunately, there is no perfect solution.

I've devised a workaround for my project: I set a specific variable PAR_GLOBAL_TEMP on every run of biber. So every one of my ~20 documents gets its own expanded instance of biber and thus its own copy of the universal binary so that lipo can work exclusively on a single binary. This is feasible in my case since the project consists of ~20 documents and not an arbitrary number. They are processed with latexmk. Each run of latexmk is prepended with a variable declaration PAR_GLOBAL_TEMP=/tmp/bibercache/<doc_specific_dirname>. Of course this is semi-automated in a Makefile, no need to type this everytime.

The price for this is paid in storage space: Each expanded instance of biber takes up ~300 MB, so I waste approximately 6 GB of SSD space on biber caches. Not an ideal solution, but workable to a certain extent.

If you're interested, I can provide further details on the workaround.

plk commented

This isn't ideal, I agree but it's something that has to be done in the extraction phase because PAR::Packer can't directly deal with universal binaries so we have to first unpack the right architecture (hence costing the unpacking space) and then re-execute the PAR::Packer bootstrapping on the thin binary. This could be avoided if we could make PAR::Packer do this natively but I haven't had time to look into it. The env var method that @krumeich mentions works and you can always do a final rm -rf `biber --cache` to clean up after every individual run.

I see and I am wondering why it worked in the past.
However, In some cases I have more than 1000 examples and running up to 16 examples in parallel working xterms.
Deleting the cache every time is no real solution, it takes too much time.

I played a bit and copied the unpacked binary from .../thin/biber over the existing packed biber in the TL/bin/ with leaving the temporaray files in `/var/folders/.../ untouched and now it worked!

Not a good solution, but I'll see if it could be organized by my skript. If not I will use alexanders or your solution.

plk commented

Ah, yes, that will work - you can just use a non-universal binary and you'll be fine. You can get this from SF if it's Intel or just unpack it with lipo yourself from the Universal binary if it's ARM and that will be fine - that's all biber is doing itself on first run anyway.

This sounds good. I suppose one would have to do this only when a new version of biber is installed. Since this is not too often, it could be a viable workaround for users who want to use concurrent instances of biber.

@hvoss49's method works for me. I copied the binary for ARM from biber's extracted directory to TeX Live's bin directory:

sudo cp $(biber --cache)/thin/biber /usr/local/texlive/2023/bin/universal-darwin/

There is a catch, though. As has been established in #368 (and following that #401), macOS regularly cleans out directories under /var/folders/…, thus removing biber's extracted caches. If you've overwritten the unextracted biber binary in TeX Live's bin directory, you're stuck and need to reinstall biber. This way you can recreate the extracted cache directory. So @hvoss49's approach should be accompanied by one of the following steps (there certainly are more possibilities):

  • Setting PAR_GLOBAL_TEMP to a custom directory that does not get cleaned out by macOS (e.g. export PAR_GLOBAL_TEMP="~/.bibercache"), or
  • Copying the extracted/lipoed binary to a different directory than TeX Live's bin directory (with higher precedence in the PATH), or
  • Saving a copy of the unextracted biber binary before overwriting it.

So this workaround is a viable solution for users who aren't afraid to dig a little deeper. The best solution, however, would be for PAR::Packer to improve its handling of universal binaries.

What happens if you delete biber´s temp folder (in my case: /var/folders/ww/7jbdn2090q1bdng_1mplhtqr0000gn/T/par-766f7373, and then run the extracted biber, saved in the default TeX binary dir?

in my case it creates a new directory with all files and all is fine. Maybe that I do not see the important point ...

plk commented

Normally, there is one temp cache folder which biber unpacks itself to on first run of a binary with a hash different from any previous biber. However, if you run the universal binary, it first unpacks itself, runs lipo to extract the correct thin binary and then runs the thin binary which, as it has a different hash than the (larger) universal binary, will unpack itself again in a different place which will persist over all runs of the same thin binary. You can delete the cache from the universal binary with no impact if you are using the thin binary directly.

By the way, if you just want to simplify things, you can extract the thin binary from the universal without running it. You can see the thin binary architectures in the universal binary like this:

lipo -info /usr/local/texlive/2023/bin/universal-darwin/biber
Architectures in the fat file: /usr/local/texlive/2023/bin/universal-darwin/biber are: x86_64 arm64 

and you can extract the thin binary you want like this (say you want to xx6_64 Intel thin binary):

lipo -extract_family x86_64 -output <some_path_to_thin_biber> /usr/local/texlive/2023/bin/universal-darwin/biber

This way there will be no separate cache for the universal binary and the thin binary will just work as expected.

I used the already unpacked version:

iMac:~ voss$ lipo -info /usr/local/texlive/current/bin/universal-darwin/biber
Non-fat file: /usr/local/texlive/current/bin/universal-darwin/biber is architecture: x86_64

and in my texlive/bin:

iMac:~ voss$ ls -la /usr/local/texlive/current/bin/universal-darwin/biber*
-rwxr-xr-x  1 voss  staff   40263952 11 Apr 21:05 /usr/local/texlive/current/bin/universal-darwin/biber
-rwxr-xr-x  1 voss  staff  101082976  8 Mär 13:59 /usr/local/texlive/current/bin/universal-darwin/biber-ms
-rwxr-xr-x  1 voss  staff  101075728  6 Mär 23:29 /usr/local/texlive/current/bin/universal-darwin/biber.packed
-rwxr-xr-x  1 voss  staff   40263952 11 Apr 21:05 /usr/local/texlive/current/bin/universal-darwin/biber.unpacked

plk commented

Should be fine. The biber-ms is the multi-script version (version 4.0) of biber which should work just as well (with biblatex-ms ...)

Yes, all is fine ... more or less :-)
I'll ask Norbert if there exists a "postaction" option for TL packages which can do the lipo run.

Maybe you already did this ..

plk commented

Can also just provide seperate x86_64 and arm64 binaries but I was explicitly asked to combine into a universal for simplicity ...

I rest my case. :-) Didn't know that the extraction is performed twice, once on the universal binary and then again on the thin binary.

Now I wonder how the unpacking process copes with a custom PAR_GLOBAL_TEMP. As @plk said: "[…] will unpack itself again in a different place which will persist over all runs of the same thin binary" (emphasis mine). When I force a specific directory name by setting PAR_GLOBAL_TEMP, wouldn't that lead to problems because during the second unpacking? The unpacking process must clean out the same directory that its source file resides in. Anyway, good to see that we have an even simpler workaround.

It is possible to create a short file biber.pl in the directory /usr/local/texlive/2023/tlpkg/tlpostcode/ which does for example the check for OS and platform and then only lipo and copy. In the package file biber.tlpobj
you need only the line

postaction script file=tlpkg/tlpostcode/biber.pl

to execute the postcode.

plk commented

@hvoss49 - I'm not sure about this - if lipo isn't installed, which it isn't guaranteed to be unfortunately, without installing the MacOS dev tools, it would throw an error in the TL install process which might be worse than just throwing an error when running biber ...

True, but isn't lipo not used if I run biber to get the correct version?

plk commented

lipo is only used if the unpacked biber (i.e. first run of new biber) is a universal binary with more than one arch inside, which would be the case after a TL install.

fyuniv commented

I had the same error. After searching online, the solution for me is to install the xcode command line tools by running xcode-select --install in terminal.

plk commented

Closing this for now as we have documented workarounds.