Support pip binary wheel manylinux installations
dazza-codes opened this issue ยท 24 comments
rasterio, shapely, pyproj etc. now provide pure-pip installations with binary wheels that provide manylinux [1] binaries for gdal, proj, geos etc.; can this project also provide binary wheel installations? e.g. see
Potentially Shared Libs
Any way it could work would be great, but it might be optimal if some common binary libs for gdal/ogr could be shared among various python libraries that require them. (The same could apply to proj/pyproj.). That is, not an OS installed shared lib, but a python manylinux [1] binary shared lib (in e.g. {project-venv-path}/lib/
, alongside of {project-venv-path}/lib/python3.6
).
[1] https://github.com/pypa/manylinux
What appears to be happening is that several related but independent pypi projects will each install their own copies of various possibly-shared libs, e.g. both rasterio and fiona both install their own copies of gdal_data
into e.g.
.../lib/python3.6/site-packages/fiona/gdal_data/*
.../lib/python3.6/site-packages/rasterio/gdal_data/*
The same applies to proj_data
, i.e.
.../lib/python3.6/site-packages/fiona/proj_data/*
.../lib/python3.6/site-packages/rasterio/proj_data/*
For rasterio alone, the binary packages are substantial, e.g.
0 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/
376744 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libopenjp2-8f6da918.so.2.3.0
2240704 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libgeos--no-undefined-b94097bf.so
16792 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libsz-978a1c7f.so.2.0.1
1165464 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libsqlite3-a9c9c58e.so.0.8.6
203512 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libnghttp2-11cb20b8.so.14.17.1
44424 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libjson-c-0c137dce.so.2.0.2
87848 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libz-a147dcb0.so.1.2.3
445144 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libwebp-dc7313d0.so.7.0.5
23783432 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libgdal-b29a5f73.so.20.5.4
1576712 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libnetcdf-02a36646.so.13.1.1
182056 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libhdf5_hl-308f82c1.so.100.1.2
3855616 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libhdf5-9d9b49cc.so.103.1.0
3620912 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libcurl-ea538880.so.4.4.0
175000 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libexpat-c4a93fc7.so.1.6.8
218400 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libpng16-de469eac.so.16.35.0
354216 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libgeos_c-a68605fd.so.1.13.1
33624 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libaec-21547b1b.so.0.0.10
250488 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libjpeg-3b10b538.so.9.3.0
453488 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libproj-cd06b982.so.12.0.0
When combined with a few more related projects, there is some potential duplication of the libs and closer inspection of the lib-versions suggests there could be some inconsistency in the versions installed (without any explicit pip options to control those binary lib versions packaged), e.g.
0 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/
87856 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libz-fiona-a147dcb0.so.1.2.3
60800 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libjson-c-fiona-5f02f62c.so.2.0.2
21884960 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libgdal-fiona-9fe15c06.so.20.5.4
354224 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libgeos_c-fiona-a68605fd.so.1.13.1
3620920 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libcurl-fiona-ea538880.so.4.4.0
1261392 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libsqlite3-fiona-25a4bc97.so.0.8.6
175008 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libexpat-fiona-c4a93fc7.so.1.6.8
203520 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libnghttp2-fiona-11cb20b8.so.14.17.1
2240712 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libgeos--no-undefined-fiona-b94097bf.so
344704 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libjpeg-fiona-3fe7dfc0.so.9.3.0
279840 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libpng16-fiona-898afbbd.so.16.35.0
453488 2020-10-13 10:23 python/lib/python3.6/site-packages/Fiona.libs/libproj-fiona-cd06b982.so.12.0.0
0 2020-10-13 10:23 python/lib/python3.6/site-packages/pyproj/.libs/
92080 2020-10-13 10:23 python/lib/python3.6/site-packages/pyproj/.libs/libz-eb09ad1d.so.1.2.3
1271584 2020-10-13 10:23 python/lib/python3.6/site-packages/pyproj/.libs/libsqlite3-b65a32f0.so.0.8.6
8155504 2020-10-13 10:23 python/lib/python3.6/site-packages/pyproj/.libs/libproj-d352b7c6.so.15.2.1
0 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/
376744 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libopenjp2-8f6da918.so.2.3.0
2240704 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libgeos--no-undefined-b94097bf.so
16792 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libsz-978a1c7f.so.2.0.1
1165464 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libsqlite3-a9c9c58e.so.0.8.6
203512 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libnghttp2-11cb20b8.so.14.17.1
44424 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libjson-c-0c137dce.so.2.0.2
87848 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libz-a147dcb0.so.1.2.3
445144 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libwebp-dc7313d0.so.7.0.5
23783432 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libgdal-b29a5f73.so.20.5.4
1576712 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libnetcdf-02a36646.so.13.1.1
182056 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libhdf5_hl-308f82c1.so.100.1.2
3855616 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libhdf5-9d9b49cc.so.103.1.0
3620912 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libcurl-ea538880.so.4.4.0
175000 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libexpat-c4a93fc7.so.1.6.8
218400 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libpng16-de469eac.so.16.35.0
354216 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libgeos_c-a68605fd.so.1.13.1
33624 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libaec-21547b1b.so.0.0.10
250488 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libjpeg-3b10b538.so.9.3.0
453488 2020-10-13 10:23 python/lib/python3.6/site-packages/rasterio.libs/libproj-cd06b982.so.12.0.0
0 2020-10-13 10:23 python/lib/python3.6/site-packages/numpy.libs/
92080 2020-10-13 10:23 python/lib/python3.6/site-packages/numpy.libs/libz-eb09ad1d.so.1.2.3
30077440 2020-10-13 10:23 python/lib/python3.6/site-packages/numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so
2260064 2020-10-13 10:23 python/lib/python3.6/site-packages/numpy.libs/libgfortran-2e0d59d6.so.5.0.0
263992 2020-10-13 10:23 python/lib/python3.6/site-packages/numpy.libs/libquadmath-2d0c479f.so.0.0.0
0 2020-10-13 10:23 python/lib/python3.6/site-packages/shapely/.libs/
2240704 2020-10-13 10:23 python/lib/python3.6/site-packages/shapely/.libs/libgeos--no-undefined-b94097bf.so
354216 2020-10-13 10:23 python/lib/python3.6/site-packages/shapely/.libs/libgeos_c-a68605fd.so.1.13.1
The request here on this project is that it is very close to the C-source used by some common python wrappers and it might provide a "source of truth" about how to package some common libs using pip/manylinux wheels. (The same request could apply to the C-source for proj perhaps.). Obviously the manylinux builds would need to provide version specific builds. (I don't have a clear idea on how they would be provided as shared-libs, only that the duplications and inconsistencies observed above might benefit from some kind of shared-libs solutions -- that are not OS package solutions, despite how much work goes into those.)
can this project also provide binary wheel installations?
See #2166 (comment)
BTW, for questions GDAL uses the mailing list https://lists.osgeo.org/mailman/listinfo/gdal-dev
This issue is more of a feature request for packaging than a question.
This request for optimized, shared libs is also related to stripping binaries for a smaller footprint, see also
Packaging GDAL is a significant effort. For Python'ers, While this is not equivalent to the pip experience, Conda is probably the best way to go that provides PROJ and GDAL binaries shared among several packages.
This feature request is specific to pip installations using binary wheels; although the conda option is useful, it's irrelevant to resolving this specific feature request, unless something about that solution can be applied in some way to a common, shared pip installation. A pointer to the details of that solution might be helpful here. Clearly, Fiona and rasterio have already solved some of the problem of packaging libgdal for python-pip installations, so this feature request is only about providing a common denominator for any packages that consume a common binary library. A possible solution is to fork and adapt https://github.com/rasterio/rasterio-wheels for the gdal project and then rasterio/Fiona might depend on a common pip-dgal dependency that provides a binary libgdal built with https://github.com/matthew-brett/multibuild (?). (Similarly for libgeos
, libproj
etc I guess.)
Clearly, Fiona and rasterio have already solved some of the problem of packaging libgdal for python-pip installations, so this feature request is only about providing a common denominator for any packages that consume a common binary library
I'm not sure there's an appetite of the rasterio team to collaborate on this, since they see the wheels as a competitive advantage over the current situation of gdal-python that doesn't provide binary wheels. If there's no collaboration, that could result in some Fiona/rasterio version using some GDAL version, and the gdal-python one using another one, and people at runtime loading both would get clashes/crashes.
Conda based approaches look to me a more solid approach that ad-hoc pip wheels where GDAL dependencies aren't necessarily updated. Wondering to which extent taking the .so, .dylib, .dll from Conda and repackaging them to be wheel compatible wouldn't save some duplication of work ?
Anyway, this would need a champion to lead the effort.
My experiments on this currently lead to a post-pip install hack like (don't do this at home):
hack_shared_libs () {
site=$1
export GDAL_DATA="${site}/share/gdal_data"
export PROJ_DATA="${site}/share/proj_data"
mkdir -p "${GDAL_DATA}"
mkdir -p "${PROJ_DATA}"
export SHARED_LIBS="${site}/share/libs"
mkdir -p "${SHARED_LIBS}"
find "${site}" -type d -name 'gdal_data' | while read -r data_path; do
if [ "$data_path" != "$GDAL_DATA" ]; then
rsync -auq "$data_path"/ "$GDAL_DATA"/
rm -rf "$data_path"
ln -s "$GDAL_DATA" "$data_path"
fi
done
find "${site}" -type d -name 'proj_data' | while read -r data_path; do
if [ "$data_path" != "$PROJ_DATA" ]; then
rsync -auq "$data_path"/ "$PROJ_DATA"/
rm -rf "$data_path"
ln -s "$PROJ_DATA" "$data_path"
fi
done
# Updating the LD_LIBRARY_PATH can fix symbol resolution
export LD_LIBRARY_PATH="$SHARED_LIBS:$LD_LIBRARY_PATH"
move_to_shared_libs () {
lib_path=$1
if [ -d "$lib_path" ]; then
rsync -auq "$lib_path"/ "$SHARED_LIBS"/
rm -rf "$lib_path"
ln -s "$SHARED_LIBS" "$lib_path"
fi
}
move_to_shared_libs "$site"/rasterio.libs
move_to_shared_libs "$site"/Fiona.libs
move_to_shared_libs "$site"/numpy.libs
move_to_shared_libs "$site"/pyproj/.libs
move_to_shared_libs "$site"/shapely/.libs
# TODO: remove this hack on shapely/geos.py
# due to https://github.com/Toblerity/Shapely/issues/1013
# try a hack to patch shapely/geos.py
patch "$site"/shapely/geos.py "$SCRIPT_PATH"/patches/shapely/geos.patch
# # To check for missing symbols, use:
# find "$SHARED_LIBS"/ -name "*.so*" | while read lib_name; do
# ldd -r "$lib_name" 2>&1
# done
}
Using it requires setting some env-vars like:
package_dst=$(python -c 'import sysconfig; print(sysconfig.get_paths()["purelib"])')
# experimental option:
hack_shared_libs "$package_dst"
# these env-vars are required for hacked_shared_libs
export GDAL_DATA="${package_dst}/share/gdal_data"
export PROJ_DATA="${package_dst}/share/proj_data"
export LD_LIBRARY_PATH="${package_dst}/share/libs:$LD_LIBRARY_PATH"
This is subsequently tested by running a project pytest suite on the modified venv site-packages that has the share/libs modifications. I'm not proposing any general use of such hack, just noting that the experiment works (but requires a patch on shapely/geos.py to find libgeos OK).
I fully get that there's possible reluctance to work on it and preferences to use conda vs. pure-pip and various competition among packages, but we all stand on the shoulders of giants in one way or another and all the packaging solutions get better in various ways, so an open mind to all the evolution is useful. I don't expect this to be resolved anytime soon and if there is some kind of perceived pressure to close issues promptly, so be it, but otherwise it might help to leave it open/unresolved. I can't go out on a limb to try to do a bunch of work myself on it unless there is support for it. While a solid solution would be ideal, all I can do is hack something for now. I don't know enough about the details of both the conda-packaging and the pip-packaging/multibuild/wheels to see a simple solution right away - open to useful pointers.
While this hack is nasty, there is something appealing about the shared-libs experiment:
$ cd /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/
$ ls -l rasterio.libs Fiona.libs numpy.libs shapely/.libs pyproj/.libs
lrwxrwxrwx 1 joe joe 59 Oct 30 21:53 Fiona.libs -> /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/share/libs
lrwxrwxrwx 1 joe joe 59 Oct 30 21:53 numpy.libs -> /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/share/libs
lrwxrwxrwx 1 joe joe 59 Oct 30 21:53 pyproj/.libs -> /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/share/libs
lrwxrwxrwx 1 joe joe 59 Oct 30 21:53 rasterio.libs -> /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/share/libs
lrwxrwxrwx 1 joe joe 59 Oct 30 21:53 shapely/.libs -> /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/share/libs
$ ls -l share/libs/
total 110804
-rwxr-xr-x 1 joe joe 35656 Oct 30 21:53 libaec-f0d4887b.so.0.0.10
-rwxr-xr-x 1 joe joe 3532904 Oct 30 21:53 libcurl-ea538880.so.4.4.0
-rwxr-xr-x 1 joe joe 3532912 Oct 30 21:53 libcurl-fiona-ea538880.so.4.4.0
-rwxr-xr-x 1 joe joe 222320 Oct 30 21:53 libexpat-09c47d4c.so.1.6.8
-rwxr-xr-x 1 joe joe 172944 Oct 30 21:53 libexpat-fiona-c4a93fc7.so.1.6.8
-rwxr-xr-x 1 joe joe 23787528 Oct 30 21:53 libgdal-044c25e5.so.20.5.4
-rwxr-xr-x 1 joe joe 21884960 Oct 30 21:53 libgdal-fiona-9fe15c06.so.20.5.4
-rwxr-xr-x 1 joe joe 323632 Oct 30 21:53 libgeos_c-a68605fd.so.1.13.1
-rwxr-xr-x 1 joe joe 323640 Oct 30 21:53 libgeos_c-fiona-a68605fd.so.1.13.1
-rwxr-xr-x 1 joe joe 2240704 Oct 30 21:53 libgeos--no-undefined-b94097bf.so
-rwxr-xr-x 1 joe joe 2240712 Oct 30 21:53 libgeos--no-undefined-fiona-b94097bf.so
-rwxr-xr-x 1 joe joe 2260064 Oct 30 21:53 libgfortran-2e0d59d6.so.5.0.0
-rwxr-xr-x 1 joe joe 4236544 Oct 30 21:53 libhdf5-4377e0cf.so.103.1.0
-rwxr-xr-x 1 joe joe 186152 Oct 30 21:53 libhdf5_hl-92c1cdd8.so.100.1.2
-rwxr-xr-x 1 joe joe 342720 Oct 30 21:53 libjpeg-3fe7dfc0.so.9.3.0
-rwxr-xr-x 1 joe joe 342720 Oct 30 21:53 libjpeg-fiona-3fe7dfc0.so.9.3.0
-rwxr-xr-x 1 joe joe 58800 Oct 30 21:53 libjson-c-5f02f62c.so.2.0.2
-rwxr-xr-x 1 joe joe 58808 Oct 30 21:53 libjson-c-fiona-5f02f62c.so.2.0.2
-rwxr-xr-x 1 joe joe 1822440 Oct 30 21:53 libnetcdf-07221d8a.so.13.1.1
-rwxr-xr-x 1 joe joe 205616 Oct 30 21:53 libnghttp2-11cb20b8.so.14.17.1
-rwxr-xr-x 1 joe joe 205624 Oct 30 21:53 libnghttp2-fiona-11cb20b8.so.14.17.1
-rwxr-xr-x 1 joe joe 30077440 Oct 30 21:53 libopenblasp-r0-ae94cfde.3.9.dev.so
-rwxr-xr-x 1 joe joe 378776 Oct 30 21:53 libopenjp2-8f6da918.so.2.3.0
-rwxr-xr-x 1 joe joe 281944 Oct 30 21:53 libpng16-898afbbd.so.16.35.0
-rwxr-xr-x 1 joe joe 281952 Oct 30 21:53 libpng16-fiona-898afbbd.so.16.35.0
-rwxr-xr-x 1 joe joe 453488 Oct 30 21:53 libproj-cd06b982.so.12.0.0
-rwxr-xr-x 1 joe joe 8155504 Oct 30 21:53 libproj-d352b7c6.so.15.2.1
-rwxr-xr-x 1 joe joe 453488 Oct 30 21:53 libproj-fiona-cd06b982.so.12.0.0
-rwxr-xr-x 1 joe joe 261912 Oct 30 21:53 libquadmath-2d0c479f.so.0.0.0
-rwxr-xr-x 1 joe joe 1273568 Oct 30 21:53 libsqlite3-b65a32f0.so.0.8.6
-rwxr-xr-x 1 joe joe 1421520 Oct 30 21:53 libsqlite3-bc0a2dd7.so.0.8.6
-rwxr-xr-x 1 joe joe 1259400 Oct 30 21:53 libsqlite3-fiona-25a4bc97.so.0.8.6
-rwxr-xr-x 1 joe joe 18760 Oct 30 21:53 libsz-53d02de5.so.2.0.1
-rwxr-xr-x 1 joe joe 783120 Oct 30 21:53 libwebp-fbd93615.so.7.0.5
-rwxr-xr-x 1 joe joe 85656 Oct 30 21:53 libz-a147dcb0.so.1.2.3
-rwxr-xr-x 1 joe joe 94144 Oct 30 21:53 libz-eb09ad1d.so.1.2.3
-rwxr-xr-x 1 joe joe 85664 Oct 30 21:53 libz-fiona-a147dcb0.so.1.2.3
Wondering to which extent taking the .so, .dylib, .dll from Conda and repackaging them to be wheel compatible wouldn't save some duplication of work ?
https://github.com/conda-incubator/conda-press could be interesting (although work on it seems to have stalled)
although the conda option is useful, it's irrelevant to resolving this specific feature request,
The reference to conda is not that irrelevant, in the sense that conda has been specifically designed to overcome this limitation of wheels to handle (non-python) shared libraries.
As @rouault already mentioned, packaging is a big effort. Moreover, this doesn't only need someone stepping up to do the work in the actual geospatial packages (gdal, (py)proj, geos, rasterio, shapely, etc), but also in the general python packaging ecosystem. Currently, wheels are not designed to link to a specific build of another wheel to share libraries like this (you can pin versions, but not exact builds), and there is no way to know if another wheel is compatible or not. For example, both numpy and scipy's wheels include its own copy of openblas, and don't share this (it's a similar problem as the one you raise here, and they haven't solved it yet).
There has been talk about improving this situation, see eg the "pynativelib" proposal of Nathaniel Smith at pypa/wheel-builders#2, but AFAIK there hasn't changed a lot the last years. A lot of related discussion at pypa/packaging-problems#25
I've spent some time to review some links above and still need time to further review details of conda packaging, auditwheel and conda-press. I want to better understand some of the ABI issues and the so called "thin" vs. "fat" library installs and paths, e.g. in my hacked share/libs there appear to be the same .so versions with various prefixes and no obvious way to prune them using some kind of sem-ver dep-tree solution:
-rwxr-xr-x 1 joe joe 23787528 Oct 30 21:53 libgdal-044c25e5.so.20.5.4
-rwxr-xr-x 1 joe joe 21884960 Oct 30 21:53 libgdal-fiona-9fe15c06.so.20.5.4
-rwxr-xr-x 1 joe joe 323632 Oct 30 21:53 libgeos_c-a68605fd.so.1.13.1
-rwxr-xr-x 1 joe joe 323640 Oct 30 21:53 libgeos_c-fiona-a68605fd.so.1.13.1
-rwxr-xr-x 1 joe joe 2240704 Oct 30 21:53 libgeos--no-undefined-b94097bf.so
-rwxr-xr-x 1 joe joe 2240712 Oct 30 21:53 libgeos--no-undefined-fiona-b94097bf.so
As I understand it at present, the argument is that conda properly solves the shared-libs problems and pip wheels use auditwheel to avoid library conflicts. The build complexity to support all platforms and releases is massive.
I guess my hope in opening this issue on this project was that it might be a common denominator for other pythonic libraries to depend on for a binary libgdal (and similarly for proj, libgeos, etc.). I don't know, but it seems like this would require something like the pynativelib
proposal or some kind of symlink solutions that Ubuntu uses in it's alternative selections [1]. It's difficult to see how this could all avoid replicating an entire package manager solution (conda, apt, yum, etc).
[1] http://manpages.ubuntu.com/manpages/trusty/man8/update-alternatives.8.html
to avoid library conflicts
if libgdal-foo.so and libgdal-bar.so are loaded in the same process, this is going to crash at runtime due to -foo using some symbols of -bar, or the reverse.With DLL, such symbol mixing seems to be less likely due to how symbol resolving works, but on Linux, such .so hell happens in practice.
The current practice in the rasterio and fiona wheels is to package .so libs within subdirectories of those site-packages, with some filename identifiers for those .so files that attempt to "isolate" the libs from each other. Is it true that loading rasterio and fiona wheels in the same process, which would try to load symbols from seperate .so files, could lead to in-memory symbol mixing? (If so, is that an important argument or a necessary practice for using common shared libs? What happens with conda, Debian or other OS installations that install multiple versions of the gdal lib, or is that not possible without symbol conflicts?). Part of the motivation for this issue is to reduce package sizes, but symbol resolution is paramount and AFAIK the current wheel builds (auditwheel) patch a few things to avoid conflicts. I have not yet done a closer study of auditwheel to understand the requirements and patches it applies (I can only assume it works).
It would be great to see this issue revived. Installing downstream packages like fiona (in order to install geopandas) that do not supply gdal binaries is a slightly frustrating experience when one is used to working in python virtual environments using pip-compile for dependency/version resolution.
I'm aware of the conda package, but can't use it in our workflow since we don't rely on conda environments. What is the advantage conda has here for packaging that can't be reproduced in pip?
I was surprised to see that rasterio provides the gdal binary as part of their installation. Considering that there will undoubtedly be more packages in the future that rely on gdal, wouldn't it be good practice to provide a pip-installable installation for them?
@thomasaarholt having python gdal wheels will not provide GDAL library for fiona.
I was surprised to see that rasterio provides the gdal binary as part of their installation
rasterio provide gdal libs but for it's own internal use, not the binary.
What is the advantage conda has here for packaging that can't be reproduced in pip?
Conda has the fundamental advantage of being a general package manager, not just a python package manager. So with conda you can install gdal itself, and install python packages that depend on this gdal package.
With pip this is not possible, since it can only deal with python packages. And thus the gdal library needs to be included in a python package that needs it (which is what fiona, rasterio, etc are doing).
See my comment above #3060 (comment) for some more details on that.
Installing downstream packages like fiona (in order to install geopandas) that do not supply gdal binaries
Note that fiona does supply binaries for linux and mac, only not for windows (which is of course an important missing piece, but just to point out).
Oh my god. I'm an idiot-ish. First off, thanks for your prompt replies, they make sense!
On where I was an idiot: I recently switched to an M1 mac, on which I've been trying to install geopandas in a docker image running debian linux. Now this debian image runs on "native" arm architecture. And there (currently) aren't fiona binaries for arm architecture linux... (I'll write an issue on their github).
Now that itself isn't particularly idiot-ish, except that this is the second time in a week I have had (and had diagnosed for me) this problem. The other one was polars (pandas alternative).
@rouault on your point of using conda:
- conda is not a not part of the standard python ecosystem
- conda is primarily used in the datascience area - hardly at all for normal python development
- with the state of pip in 2022 and the widespread use of manylinux wheels conda mostly obsolete
- if one wants to build a python package (for pypi) on top of gdal - relying on conda is no option
- conda is not free for commercial use - with this it is not an option for many teams/companies (official source)
Especially the last point goes against the very spirit of the open source community. For the distribution of OSS we should not rely on a pay to use package repository - especially not as the go-to solution.
A few responses specifically on the conda topic:
with the state of pip in 2022 and the widespread use of manylinux wheels conda mostly obsolete
While the state of pip/wheels certainly has improved a lot, the case being discussed here (being able to depend on a GDAL python wheel for other packages such as rasterio or fiona to build against) is, whether you like it or not, something that currently cannot be done with pip, and is specifically solved by conda.
To be clear, that doesn't mean there cannot be a GDAL python wheel just for users of the GDAL native python bindings. It's just that this wheel cannot be reused for other packages (which is a large part of the discussion above).
conda is not free for commercial use - with this it is not an option for many teams/companies (official source)
Small clarification: conda
the package manager is a free to use open source project. It's only the Anaconda distribution and installing from anaconda.com's default channel that is limited by their Terms Of Service, but installing from the conda-forge channel is not affected by the TOS (https://conda-forge.org/blog/posts/2020-11-20-anaconda-tos/)
There are two discussions getting mixed here in this issue:
- Could the GDAL project build wheels for its Python bindings (and upload to PyPI, https://pypi.org/project/GDAL)?
- Could those wheels be built and used in such a way to allow sharing its GDAL library with other Python packages that depend on GDAL (rasterio, fiona, etc)?
For the first issue (I am not a GDAL maintainer, but interpreting what was said before, eg #2166 (comment), #3060 (comment)), I think the idea of providing wheels is generally welcomed, but requires someone stepping up to do the work ("a champion to lead the effort").
The second issue is much more complex, and AFAIK not something that can be solved by GDAL but requires changes in the broader python packaging space (see my comment above #3060 (comment)).
(and it is mainly here, if you want to be able to share libraries, that the reference to use conda is most relevant)
I suppose many are mostly interested in the first issue (being able to more easily install the GDAL python bindings with pip). It might be worth to open a separate issue for that to distinguish both discussions.
but requires someone stepping up to do the work ("a champion to lead the effort").
a champion for bootstrapping, and then an automated process. The process of manually building binary wheels for each release wouln't be sustainable on the long term. The nice thing with conda-forge is the automation and transparency in build recipees. As far as I know for pip binary wheels, everyone has its custom, somewhat opaque way, of generating wheels, which is tricky when you don't have direct access to the various build OS. GDAL and its ecosystem (the fact that there are several python packages that share the same binary dependencies with GDAL, things like fiona, rasterio, pyproj, pygeos) are probably among the worst candidates to work nicely with pip as it is currently. Another difficulty with GDAL is that doing binary builds of it is a trade-off of multiple factors and there's no good answer: do you want a minimal GDAL build with just a few popular drivers (good luck to have people agree on which few popular drivers should be included) ? or a large one with ~ all (open source) dependencies ? or one with only permissive and LGPL-like dependencies ? or one with GPL dependencies ? ...
Regarding automation, in pyogrio we are quite happy with our current solution of using Github Actions with cibuildwheel (and in our case also using vcpkg to build GDAL, this might be different for GDAL itself if you already have build scripts for each platform): https://github.com/geopandas/pyogrio/blob/f16009e26bc9982a531bc4f8570fdf6d6dfa6829/.github/workflows/release.yml#L81-L182 (and fiona recently adopted this as well for providing windows wheels)
but requires someone stepping up to do the work ("a champion to lead the effort").
... Another difficulty with GDAL is that doing binary builds of it is a trade-off of multiple factors and there's no good answer: do you want a minimal GDAL build with just a few popular drivers (good luck to have people agree on which few popular drivers should be included) ? or a large one with ~ all (open source) dependencies ? or one with only permissive and LGPL-like dependencies ? or one with GPL dependencies ? ...
Regarding the dependencies, to some extent, this can be solved by offering plugins, see my efforts here:
https://pypi.org/project/gdal-ecw/
https://pypi.org/project/gdal-sid/
https://github.com/talos-gis/gdal-sid
https://github.com/talos-gis/gdal-ecw
Which can be easily installed alongside the unofficial windows binary wheels:
https://www.lfd.uci.edu/~gohlke/pythonlibs/
(side note: I see that he wrote archived
at the top and that there are no 3.5 wheels, ASAIK this is/was the only source for regular windows wheels...)
I have configured pipelines for building binary wheels at https://gitlab.com/mentaljam/gdal-wheels. Wheels are published to GitLab's Python package registry. Readme contains some usage examples.
I have configured pipelines for building binary wheels at https://gitlab.com/mentaljam/gdal-wheels. Wheels are published to GitLab's Python package registry. Readme contains some usage examples.
Do you have more details on how you setup the runners? I see that its referenced by self-hosted-linux and self-hosted-windows tags.
@DruidNx, first I tried GitLab's shared Windows runner. However, it has a fixed timeout of 2 hours, which is not enough to compile all the dependencies. So, I had to configure a self-hosted shell runner on my PC. Later I configured the building of manylinux wheels and set up a self-hosted Docker runner right away. To be honest, I did not try a shared Linux runner. Maybe it will be sufficient for building GDAL.
UPD: Two hours may sound like a long time, but shared runners are limited in CPU resources, which dramatically increases compilation times
Thanks, I was able to get the wheels, but its missing the binary tools like ogr2ogr, ogrinfo, etc.
Issue: https://gitlab.com/mentaljam/gdal-wheels/-/issues/1