CAST-genomics/haptools

pgenlib/numpy binary incompatibility issue

Closed this issue · 2 comments

problem

After installing haptools with pgenlib, users may encounter the following.

$ import pgenlib
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "src/pgenlib/pgenlib.pyx", line 1, in init pgenlib
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject.

short-term solution

The quick solution is to 1) install a version of numpy >= 1.20.0 and then 2) force a re-compilation of pgenlib.

pip install 'numpy>=1.20.0' Cython
pip install --force-reinstall --no-cache-dir --no-build-isolation pgenlib

steps to reproduce

# first, create a new conda env with a new version of numpy
mamba create -y -n haptools-binary-issue -c conda-forge 'pip>=22.2.2' gxx_linux-64 'numpy>=1.20' 'python=3.9'
conda activate haptools-binary-issue

# install haptools and build pgenlib with the new version of numpy
pip install --no-binary pgenlib haptools[files]

# downgrade numpy to a version <1.20 b/c the binary incompatibility was introduced in 1.20
# v1.19.3 is the lowest supported numpy for python3.9 according to "oldest-supported-numpy"
pip install 'numpy==1.19.3'

# this should result in a binary compatibility error:
python -c "import pgenlib"

background

When pip installs a package it has to install two sets of dependencies for that package: the "build" and "run" time dependencies. The build-time dependencies are needed only when pip is compiling the package at installation. The run-time dependencies are for whenever the package is imported and used. The pgenlib package uses numpy as a build-time dependency, and haptools uses numpy and pgenlib as run-time dependencies.

A binary incompatibility can occur when the version of numpy used to build pgenlib is newer than the version used when running pgenlib. In numpy v1.20.0, an incompatible change was introduced to its C API (source), which is why we see this issue crop up when the build version of numpy is greater than 1.20.0 and the run-time version is less than 1.20.0 (or vice versa).

long-term solution

The recommended solution online seems to be for pgenlib to list oldest-supported-numpy as a build-time dependency. This will force pip to build pgenlib with a compatible version of numpy. I've opened PR chrchang/plink-ng#229 for this purpose.

relevant reading

closing because chrchang/plink-ng#229 was merged 🎉

The fix won't go into effect until another release of pgenlib is pushed to PyPI. Feel free to reopen this issue if the problem persists after that.

update: v0.81.3 of pgenlib was released with oldest-supported-numpy as a build dependency

so haptools will now require pgenlib>=0.81.3 as of 916af07