OSGeo/gdal

Support pip binary wheel manylinux installations

dazza-codes opened this issue ยท 24 comments

rasterio, shapely, pyproj etc. now provide pure-pip installations with binary wheels that provide manylinux [1] binaries for gdal, proj, geos etc.; can this project also provide binary wheel installations? e.g. see

Potentially Shared Libs

Any way it could work would be great, but it might be optimal if some common binary libs for gdal/ogr could be shared among various python libraries that require them. (The same could apply to proj/pyproj.). That is, not an OS installed shared lib, but a python manylinux [1] binary shared lib (in e.g. {project-venv-path}/lib/, alongside of {project-venv-path}/lib/python3.6).

[1] https://github.com/pypa/manylinux

What appears to be happening is that several related but independent pypi projects will each install their own copies of various possibly-shared libs, e.g. both rasterio and fiona both install their own copies of gdal_data into e.g.

  • .../lib/python3.6/site-packages/fiona/gdal_data/*
  • .../lib/python3.6/site-packages/rasterio/gdal_data/*

The same applies to proj_data, i.e.

  • .../lib/python3.6/site-packages/fiona/proj_data/*
  • .../lib/python3.6/site-packages/rasterio/proj_data/*

For rasterio alone, the binary packages are substantial, e.g.

        0  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/
   376744  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libopenjp2-8f6da918.so.2.3.0
  2240704  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libgeos--no-undefined-b94097bf.so
    16792  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libsz-978a1c7f.so.2.0.1
  1165464  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libsqlite3-a9c9c58e.so.0.8.6
   203512  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libnghttp2-11cb20b8.so.14.17.1
    44424  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libjson-c-0c137dce.so.2.0.2
    87848  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libz-a147dcb0.so.1.2.3
   445144  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libwebp-dc7313d0.so.7.0.5
 23783432  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libgdal-b29a5f73.so.20.5.4
  1576712  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libnetcdf-02a36646.so.13.1.1
   182056  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libhdf5_hl-308f82c1.so.100.1.2
  3855616  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libhdf5-9d9b49cc.so.103.1.0
  3620912  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libcurl-ea538880.so.4.4.0
   175000  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libexpat-c4a93fc7.so.1.6.8
   218400  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libpng16-de469eac.so.16.35.0
   354216  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libgeos_c-a68605fd.so.1.13.1
    33624  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libaec-21547b1b.so.0.0.10
   250488  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libjpeg-3b10b538.so.9.3.0
   453488  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libproj-cd06b982.so.12.0.0

When combined with a few more related projects, there is some potential duplication of the libs and closer inspection of the lib-versions suggests there could be some inconsistency in the versions installed (without any explicit pip options to control those binary lib versions packaged), e.g.

        0  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/
    87856  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libz-fiona-a147dcb0.so.1.2.3
    60800  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libjson-c-fiona-5f02f62c.so.2.0.2
 21884960  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libgdal-fiona-9fe15c06.so.20.5.4
   354224  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libgeos_c-fiona-a68605fd.so.1.13.1
  3620920  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libcurl-fiona-ea538880.so.4.4.0
  1261392  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libsqlite3-fiona-25a4bc97.so.0.8.6
   175008  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libexpat-fiona-c4a93fc7.so.1.6.8
   203520  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libnghttp2-fiona-11cb20b8.so.14.17.1
  2240712  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libgeos--no-undefined-fiona-b94097bf.so
   344704  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libjpeg-fiona-3fe7dfc0.so.9.3.0
   279840  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libpng16-fiona-898afbbd.so.16.35.0
   453488  2020-10-13 10:23   python/lib/python3.6/site-packages/Fiona.libs/libproj-fiona-cd06b982.so.12.0.0
        0  2020-10-13 10:23   python/lib/python3.6/site-packages/pyproj/.libs/
    92080  2020-10-13 10:23   python/lib/python3.6/site-packages/pyproj/.libs/libz-eb09ad1d.so.1.2.3
  1271584  2020-10-13 10:23   python/lib/python3.6/site-packages/pyproj/.libs/libsqlite3-b65a32f0.so.0.8.6
  8155504  2020-10-13 10:23   python/lib/python3.6/site-packages/pyproj/.libs/libproj-d352b7c6.so.15.2.1
        0  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/
   376744  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libopenjp2-8f6da918.so.2.3.0
  2240704  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libgeos--no-undefined-b94097bf.so
    16792  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libsz-978a1c7f.so.2.0.1
  1165464  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libsqlite3-a9c9c58e.so.0.8.6
   203512  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libnghttp2-11cb20b8.so.14.17.1
    44424  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libjson-c-0c137dce.so.2.0.2
    87848  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libz-a147dcb0.so.1.2.3
   445144  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libwebp-dc7313d0.so.7.0.5
 23783432  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libgdal-b29a5f73.so.20.5.4
  1576712  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libnetcdf-02a36646.so.13.1.1
   182056  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libhdf5_hl-308f82c1.so.100.1.2
  3855616  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libhdf5-9d9b49cc.so.103.1.0
  3620912  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libcurl-ea538880.so.4.4.0
   175000  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libexpat-c4a93fc7.so.1.6.8
   218400  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libpng16-de469eac.so.16.35.0
   354216  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libgeos_c-a68605fd.so.1.13.1
    33624  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libaec-21547b1b.so.0.0.10
   250488  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libjpeg-3b10b538.so.9.3.0
   453488  2020-10-13 10:23   python/lib/python3.6/site-packages/rasterio.libs/libproj-cd06b982.so.12.0.0
        0  2020-10-13 10:23   python/lib/python3.6/site-packages/numpy.libs/
    92080  2020-10-13 10:23   python/lib/python3.6/site-packages/numpy.libs/libz-eb09ad1d.so.1.2.3
 30077440  2020-10-13 10:23   python/lib/python3.6/site-packages/numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so
  2260064  2020-10-13 10:23   python/lib/python3.6/site-packages/numpy.libs/libgfortran-2e0d59d6.so.5.0.0
   263992  2020-10-13 10:23   python/lib/python3.6/site-packages/numpy.libs/libquadmath-2d0c479f.so.0.0.0
        0  2020-10-13 10:23   python/lib/python3.6/site-packages/shapely/.libs/
  2240704  2020-10-13 10:23   python/lib/python3.6/site-packages/shapely/.libs/libgeos--no-undefined-b94097bf.so
   354216  2020-10-13 10:23   python/lib/python3.6/site-packages/shapely/.libs/libgeos_c-a68605fd.so.1.13.1

The request here on this project is that it is very close to the C-source used by some common python wrappers and it might provide a "source of truth" about how to package some common libs using pip/manylinux wheels. (The same request could apply to the C-source for proj perhaps.). Obviously the manylinux builds would need to provide version specific builds. (I don't have a clear idea on how they would be provided as shared-libs, only that the duplications and inconsistencies observed above might benefit from some kind of shared-libs solutions -- that are not OS package solutions, despite how much work goes into those.)

@dazza-codes

can this project also provide binary wheel installations?

See #2166 (comment)

BTW, for questions GDAL uses the mailing list https://lists.osgeo.org/mailman/listinfo/gdal-dev

This issue is more of a feature request for packaging than a question.

This request for optimized, shared libs is also related to stripping binaries for a smaller footprint, see also

Packaging GDAL is a significant effort. For Python'ers, While this is not equivalent to the pip experience, Conda is probably the best way to go that provides PROJ and GDAL binaries shared among several packages.

This feature request is specific to pip installations using binary wheels; although the conda option is useful, it's irrelevant to resolving this specific feature request, unless something about that solution can be applied in some way to a common, shared pip installation. A pointer to the details of that solution might be helpful here. Clearly, Fiona and rasterio have already solved some of the problem of packaging libgdal for python-pip installations, so this feature request is only about providing a common denominator for any packages that consume a common binary library. A possible solution is to fork and adapt https://github.com/rasterio/rasterio-wheels for the gdal project and then rasterio/Fiona might depend on a common pip-dgal dependency that provides a binary libgdal built with https://github.com/matthew-brett/multibuild (?). (Similarly for libgeos, libproj etc I guess.)

Clearly, Fiona and rasterio have already solved some of the problem of packaging libgdal for python-pip installations, so this feature request is only about providing a common denominator for any packages that consume a common binary library

I'm not sure there's an appetite of the rasterio team to collaborate on this, since they see the wheels as a competitive advantage over the current situation of gdal-python that doesn't provide binary wheels. If there's no collaboration, that could result in some Fiona/rasterio version using some GDAL version, and the gdal-python one using another one, and people at runtime loading both would get clashes/crashes.
Conda based approaches look to me a more solid approach that ad-hoc pip wheels where GDAL dependencies aren't necessarily updated. Wondering to which extent taking the .so, .dylib, .dll from Conda and repackaging them to be wheel compatible wouldn't save some duplication of work ?
Anyway, this would need a champion to lead the effort.

My experiments on this currently lead to a post-pip install hack like (don't do this at home):

hack_shared_libs () {
  site=$1

  export GDAL_DATA="${site}/share/gdal_data"
  export PROJ_DATA="${site}/share/proj_data"
  mkdir -p "${GDAL_DATA}"
  mkdir -p "${PROJ_DATA}"

  export SHARED_LIBS="${site}/share/libs"
  mkdir -p "${SHARED_LIBS}"

  find "${site}" -type d -name 'gdal_data' | while read -r data_path; do
    if [ "$data_path" != "$GDAL_DATA" ]; then
      rsync -auq "$data_path"/ "$GDAL_DATA"/
      rm -rf "$data_path"
      ln -s "$GDAL_DATA" "$data_path"
    fi
  done

  find "${site}" -type d -name 'proj_data' | while read -r data_path; do
    if [ "$data_path" != "$PROJ_DATA" ]; then
      rsync -auq "$data_path"/ "$PROJ_DATA"/
      rm -rf "$data_path"
      ln -s "$PROJ_DATA" "$data_path"
    fi
  done

  # Updating the LD_LIBRARY_PATH can fix symbol resolution
  export LD_LIBRARY_PATH="$SHARED_LIBS:$LD_LIBRARY_PATH"

  move_to_shared_libs () {
    lib_path=$1
    if [ -d "$lib_path" ]; then
      rsync -auq "$lib_path"/ "$SHARED_LIBS"/
      rm -rf "$lib_path"
      ln -s "$SHARED_LIBS" "$lib_path"
    fi
  }

  move_to_shared_libs "$site"/rasterio.libs
  move_to_shared_libs "$site"/Fiona.libs
  move_to_shared_libs "$site"/numpy.libs
  move_to_shared_libs "$site"/pyproj/.libs
  move_to_shared_libs "$site"/shapely/.libs

  # TODO: remove this hack on shapely/geos.py
  # due to https://github.com/Toblerity/Shapely/issues/1013
  # try a hack to patch shapely/geos.py
  patch "$site"/shapely/geos.py "$SCRIPT_PATH"/patches/shapely/geos.patch

# # To check for missing symbols, use:
# find "$SHARED_LIBS"/ -name "*.so*" | while read lib_name; do
#   ldd -r "$lib_name" 2>&1
# done

}

Using it requires setting some env-vars like:

package_dst=$(python -c 'import sysconfig; print(sysconfig.get_paths()["purelib"])')
# experimental option:
hack_shared_libs "$package_dst"
# these env-vars are required for hacked_shared_libs
export GDAL_DATA="${package_dst}/share/gdal_data"
export PROJ_DATA="${package_dst}/share/proj_data"
export LD_LIBRARY_PATH="${package_dst}/share/libs:$LD_LIBRARY_PATH"

This is subsequently tested by running a project pytest suite on the modified venv site-packages that has the share/libs modifications. I'm not proposing any general use of such hack, just noting that the experiment works (but requires a patch on shapely/geos.py to find libgeos OK).

I fully get that there's possible reluctance to work on it and preferences to use conda vs. pure-pip and various competition among packages, but we all stand on the shoulders of giants in one way or another and all the packaging solutions get better in various ways, so an open mind to all the evolution is useful. I don't expect this to be resolved anytime soon and if there is some kind of perceived pressure to close issues promptly, so be it, but otherwise it might help to leave it open/unresolved. I can't go out on a limb to try to do a bunch of work myself on it unless there is support for it. While a solid solution would be ideal, all I can do is hack something for now. I don't know enough about the details of both the conda-packaging and the pip-packaging/multibuild/wheels to see a simple solution right away - open to useful pointers.

While this hack is nasty, there is something appealing about the shared-libs experiment:

$ cd /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/

$ ls -l rasterio.libs Fiona.libs numpy.libs shapely/.libs pyproj/.libs
lrwxrwxrwx 1 joe joe 59 Oct 30 21:53 Fiona.libs -> /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/share/libs
lrwxrwxrwx 1 joe joe 59 Oct 30 21:53 numpy.libs -> /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/share/libs
lrwxrwxrwx 1 joe joe 59 Oct 30 21:53 pyproj/.libs -> /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/share/libs
lrwxrwxrwx 1 joe joe 59 Oct 30 21:53 rasterio.libs -> /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/share/libs
lrwxrwxrwx 1 joe joe 59 Oct 30 21:53 shapely/.libs -> /tmp/tmp_venv_n65eKJ/lib/python3.6/site-packages/share/libs

$ ls -l share/libs/
total 110804
-rwxr-xr-x 1 joe joe    35656 Oct 30 21:53 libaec-f0d4887b.so.0.0.10
-rwxr-xr-x 1 joe joe  3532904 Oct 30 21:53 libcurl-ea538880.so.4.4.0
-rwxr-xr-x 1 joe joe  3532912 Oct 30 21:53 libcurl-fiona-ea538880.so.4.4.0
-rwxr-xr-x 1 joe joe   222320 Oct 30 21:53 libexpat-09c47d4c.so.1.6.8
-rwxr-xr-x 1 joe joe   172944 Oct 30 21:53 libexpat-fiona-c4a93fc7.so.1.6.8
-rwxr-xr-x 1 joe joe 23787528 Oct 30 21:53 libgdal-044c25e5.so.20.5.4
-rwxr-xr-x 1 joe joe 21884960 Oct 30 21:53 libgdal-fiona-9fe15c06.so.20.5.4
-rwxr-xr-x 1 joe joe   323632 Oct 30 21:53 libgeos_c-a68605fd.so.1.13.1
-rwxr-xr-x 1 joe joe   323640 Oct 30 21:53 libgeos_c-fiona-a68605fd.so.1.13.1
-rwxr-xr-x 1 joe joe  2240704 Oct 30 21:53 libgeos--no-undefined-b94097bf.so
-rwxr-xr-x 1 joe joe  2240712 Oct 30 21:53 libgeos--no-undefined-fiona-b94097bf.so
-rwxr-xr-x 1 joe joe  2260064 Oct 30 21:53 libgfortran-2e0d59d6.so.5.0.0
-rwxr-xr-x 1 joe joe  4236544 Oct 30 21:53 libhdf5-4377e0cf.so.103.1.0
-rwxr-xr-x 1 joe joe   186152 Oct 30 21:53 libhdf5_hl-92c1cdd8.so.100.1.2
-rwxr-xr-x 1 joe joe   342720 Oct 30 21:53 libjpeg-3fe7dfc0.so.9.3.0
-rwxr-xr-x 1 joe joe   342720 Oct 30 21:53 libjpeg-fiona-3fe7dfc0.so.9.3.0
-rwxr-xr-x 1 joe joe    58800 Oct 30 21:53 libjson-c-5f02f62c.so.2.0.2
-rwxr-xr-x 1 joe joe    58808 Oct 30 21:53 libjson-c-fiona-5f02f62c.so.2.0.2
-rwxr-xr-x 1 joe joe  1822440 Oct 30 21:53 libnetcdf-07221d8a.so.13.1.1
-rwxr-xr-x 1 joe joe   205616 Oct 30 21:53 libnghttp2-11cb20b8.so.14.17.1
-rwxr-xr-x 1 joe joe   205624 Oct 30 21:53 libnghttp2-fiona-11cb20b8.so.14.17.1
-rwxr-xr-x 1 joe joe 30077440 Oct 30 21:53 libopenblasp-r0-ae94cfde.3.9.dev.so
-rwxr-xr-x 1 joe joe   378776 Oct 30 21:53 libopenjp2-8f6da918.so.2.3.0
-rwxr-xr-x 1 joe joe   281944 Oct 30 21:53 libpng16-898afbbd.so.16.35.0
-rwxr-xr-x 1 joe joe   281952 Oct 30 21:53 libpng16-fiona-898afbbd.so.16.35.0
-rwxr-xr-x 1 joe joe   453488 Oct 30 21:53 libproj-cd06b982.so.12.0.0
-rwxr-xr-x 1 joe joe  8155504 Oct 30 21:53 libproj-d352b7c6.so.15.2.1
-rwxr-xr-x 1 joe joe   453488 Oct 30 21:53 libproj-fiona-cd06b982.so.12.0.0
-rwxr-xr-x 1 joe joe   261912 Oct 30 21:53 libquadmath-2d0c479f.so.0.0.0
-rwxr-xr-x 1 joe joe  1273568 Oct 30 21:53 libsqlite3-b65a32f0.so.0.8.6
-rwxr-xr-x 1 joe joe  1421520 Oct 30 21:53 libsqlite3-bc0a2dd7.so.0.8.6
-rwxr-xr-x 1 joe joe  1259400 Oct 30 21:53 libsqlite3-fiona-25a4bc97.so.0.8.6
-rwxr-xr-x 1 joe joe    18760 Oct 30 21:53 libsz-53d02de5.so.2.0.1
-rwxr-xr-x 1 joe joe   783120 Oct 30 21:53 libwebp-fbd93615.so.7.0.5
-rwxr-xr-x 1 joe joe    85656 Oct 30 21:53 libz-a147dcb0.so.1.2.3
-rwxr-xr-x 1 joe joe    94144 Oct 30 21:53 libz-eb09ad1d.so.1.2.3
-rwxr-xr-x 1 joe joe    85664 Oct 30 21:53 libz-fiona-a147dcb0.so.1.2.3

Wondering to which extent taking the .so, .dylib, .dll from Conda and repackaging them to be wheel compatible wouldn't save some duplication of work ?

https://github.com/conda-incubator/conda-press could be interesting (although work on it seems to have stalled)

although the conda option is useful, it's irrelevant to resolving this specific feature request,

The reference to conda is not that irrelevant, in the sense that conda has been specifically designed to overcome this limitation of wheels to handle (non-python) shared libraries.

As @rouault already mentioned, packaging is a big effort. Moreover, this doesn't only need someone stepping up to do the work in the actual geospatial packages (gdal, (py)proj, geos, rasterio, shapely, etc), but also in the general python packaging ecosystem. Currently, wheels are not designed to link to a specific build of another wheel to share libraries like this (you can pin versions, but not exact builds), and there is no way to know if another wheel is compatible or not. For example, both numpy and scipy's wheels include its own copy of openblas, and don't share this (it's a similar problem as the one you raise here, and they haven't solved it yet).
There has been talk about improving this situation, see eg the "pynativelib" proposal of Nathaniel Smith at pypa/wheel-builders#2, but AFAIK there hasn't changed a lot the last years. A lot of related discussion at pypa/packaging-problems#25

I've spent some time to review some links above and still need time to further review details of conda packaging, auditwheel and conda-press. I want to better understand some of the ABI issues and the so called "thin" vs. "fat" library installs and paths, e.g. in my hacked share/libs there appear to be the same .so versions with various prefixes and no obvious way to prune them using some kind of sem-ver dep-tree solution:

-rwxr-xr-x 1 joe joe 23787528 Oct 30 21:53 libgdal-044c25e5.so.20.5.4
-rwxr-xr-x 1 joe joe 21884960 Oct 30 21:53 libgdal-fiona-9fe15c06.so.20.5.4
-rwxr-xr-x 1 joe joe   323632 Oct 30 21:53 libgeos_c-a68605fd.so.1.13.1
-rwxr-xr-x 1 joe joe   323640 Oct 30 21:53 libgeos_c-fiona-a68605fd.so.1.13.1
-rwxr-xr-x 1 joe joe  2240704 Oct 30 21:53 libgeos--no-undefined-b94097bf.so
-rwxr-xr-x 1 joe joe  2240712 Oct 30 21:53 libgeos--no-undefined-fiona-b94097bf.so

As I understand it at present, the argument is that conda properly solves the shared-libs problems and pip wheels use auditwheel to avoid library conflicts. The build complexity to support all platforms and releases is massive.

I guess my hope in opening this issue on this project was that it might be a common denominator for other pythonic libraries to depend on for a binary libgdal (and similarly for proj, libgeos, etc.). I don't know, but it seems like this would require something like the pynativelib proposal or some kind of symlink solutions that Ubuntu uses in it's alternative selections [1]. It's difficult to see how this could all avoid replicating an entire package manager solution (conda, apt, yum, etc).

[1] http://manpages.ubuntu.com/manpages/trusty/man8/update-alternatives.8.html

to avoid library conflicts

if libgdal-foo.so and libgdal-bar.so are loaded in the same process, this is going to crash at runtime due to -foo using some symbols of -bar, or the reverse.With DLL, such symbol mixing seems to be less likely due to how symbol resolving works, but on Linux, such .so hell happens in practice.

The current practice in the rasterio and fiona wheels is to package .so libs within subdirectories of those site-packages, with some filename identifiers for those .so files that attempt to "isolate" the libs from each other. Is it true that loading rasterio and fiona wheels in the same process, which would try to load symbols from seperate .so files, could lead to in-memory symbol mixing? (If so, is that an important argument or a necessary practice for using common shared libs? What happens with conda, Debian or other OS installations that install multiple versions of the gdal lib, or is that not possible without symbol conflicts?). Part of the motivation for this issue is to reduce package sizes, but symbol resolution is paramount and AFAIK the current wheel builds (auditwheel) patch a few things to avoid conflicts. I have not yet done a closer study of auditwheel to understand the requirements and patches it applies (I can only assume it works).

It would be great to see this issue revived. Installing downstream packages like fiona (in order to install geopandas) that do not supply gdal binaries is a slightly frustrating experience when one is used to working in python virtual environments using pip-compile for dependency/version resolution.

I'm aware of the conda package, but can't use it in our workflow since we don't rely on conda environments. What is the advantage conda has here for packaging that can't be reproduced in pip?

I was surprised to see that rasterio provides the gdal binary as part of their installation. Considering that there will undoubtedly be more packages in the future that rely on gdal, wouldn't it be good practice to provide a pip-installable installation for them?

@thomasaarholt having python gdal wheels will not provide GDAL library for fiona.

I was surprised to see that rasterio provides the gdal binary as part of their installation

rasterio provide gdal libs but for it's own internal use, not the binary.

What is the advantage conda has here for packaging that can't be reproduced in pip?

Conda has the fundamental advantage of being a general package manager, not just a python package manager. So with conda you can install gdal itself, and install python packages that depend on this gdal package.
With pip this is not possible, since it can only deal with python packages. And thus the gdal library needs to be included in a python package that needs it (which is what fiona, rasterio, etc are doing).

See my comment above #3060 (comment) for some more details on that.

Installing downstream packages like fiona (in order to install geopandas) that do not supply gdal binaries

Note that fiona does supply binaries for linux and mac, only not for windows (which is of course an important missing piece, but just to point out).

Oh my god. I'm an idiot-ish. First off, thanks for your prompt replies, they make sense!

On where I was an idiot: I recently switched to an M1 mac, on which I've been trying to install geopandas in a docker image running debian linux. Now this debian image runs on "native" arm architecture. And there (currently) aren't fiona binaries for arm architecture linux... (I'll write an issue on their github).

Now that itself isn't particularly idiot-ish, except that this is the second time in a week I have had (and had diagnosed for me) this problem. The other one was polars (pandas alternative).

@rouault on your point of using conda:

  • conda is not a not part of the standard python ecosystem
  • conda is primarily used in the datascience area - hardly at all for normal python development
  • with the state of pip in 2022 and the widespread use of manylinux wheels conda mostly obsolete
  • if one wants to build a python package (for pypi) on top of gdal - relying on conda is no option
  • conda is not free for commercial use - with this it is not an option for many teams/companies (official source)

Especially the last point goes against the very spirit of the open source community. For the distribution of OSS we should not rely on a pay to use package repository - especially not as the go-to solution.

A few responses specifically on the conda topic:

with the state of pip in 2022 and the widespread use of manylinux wheels conda mostly obsolete

While the state of pip/wheels certainly has improved a lot, the case being discussed here (being able to depend on a GDAL python wheel for other packages such as rasterio or fiona to build against) is, whether you like it or not, something that currently cannot be done with pip, and is specifically solved by conda.
To be clear, that doesn't mean there cannot be a GDAL python wheel just for users of the GDAL native python bindings. It's just that this wheel cannot be reused for other packages (which is a large part of the discussion above).

conda is not free for commercial use - with this it is not an option for many teams/companies (official source)

Small clarification: conda the package manager is a free to use open source project. It's only the Anaconda distribution and installing from anaconda.com's default channel that is limited by their Terms Of Service, but installing from the conda-forge channel is not affected by the TOS (https://conda-forge.org/blog/posts/2020-11-20-anaconda-tos/)


There are two discussions getting mixed here in this issue:

  • Could the GDAL project build wheels for its Python bindings (and upload to PyPI, https://pypi.org/project/GDAL)?
  • Could those wheels be built and used in such a way to allow sharing its GDAL library with other Python packages that depend on GDAL (rasterio, fiona, etc)?

For the first issue (I am not a GDAL maintainer, but interpreting what was said before, eg #2166 (comment), #3060 (comment)), I think the idea of providing wheels is generally welcomed, but requires someone stepping up to do the work ("a champion to lead the effort").

The second issue is much more complex, and AFAIK not something that can be solved by GDAL but requires changes in the broader python packaging space (see my comment above #3060 (comment)).
(and it is mainly here, if you want to be able to share libraries, that the reference to use conda is most relevant)

I suppose many are mostly interested in the first issue (being able to more easily install the GDAL python bindings with pip). It might be worth to open a separate issue for that to distinguish both discussions.

but requires someone stepping up to do the work ("a champion to lead the effort").

a champion for bootstrapping, and then an automated process. The process of manually building binary wheels for each release wouln't be sustainable on the long term. The nice thing with conda-forge is the automation and transparency in build recipees. As far as I know for pip binary wheels, everyone has its custom, somewhat opaque way, of generating wheels, which is tricky when you don't have direct access to the various build OS. GDAL and its ecosystem (the fact that there are several python packages that share the same binary dependencies with GDAL, things like fiona, rasterio, pyproj, pygeos) are probably among the worst candidates to work nicely with pip as it is currently. Another difficulty with GDAL is that doing binary builds of it is a trade-off of multiple factors and there's no good answer: do you want a minimal GDAL build with just a few popular drivers (good luck to have people agree on which few popular drivers should be included) ? or a large one with ~ all (open source) dependencies ? or one with only permissive and LGPL-like dependencies ? or one with GPL dependencies ? ...

Regarding automation, in pyogrio we are quite happy with our current solution of using Github Actions with cibuildwheel (and in our case also using vcpkg to build GDAL, this might be different for GDAL itself if you already have build scripts for each platform): https://github.com/geopandas/pyogrio/blob/f16009e26bc9982a531bc4f8570fdf6d6dfa6829/.github/workflows/release.yml#L81-L182 (and fiona recently adopted this as well for providing windows wheels)

but requires someone stepping up to do the work ("a champion to lead the effort").

... Another difficulty with GDAL is that doing binary builds of it is a trade-off of multiple factors and there's no good answer: do you want a minimal GDAL build with just a few popular drivers (good luck to have people agree on which few popular drivers should be included) ? or a large one with ~ all (open source) dependencies ? or one with only permissive and LGPL-like dependencies ? or one with GPL dependencies ? ...

Regarding the dependencies, to some extent, this can be solved by offering plugins, see my efforts here:
https://pypi.org/project/gdal-ecw/
https://pypi.org/project/gdal-sid/
https://github.com/talos-gis/gdal-sid
https://github.com/talos-gis/gdal-ecw

Which can be easily installed alongside the unofficial windows binary wheels:
https://www.lfd.uci.edu/~gohlke/pythonlibs/
(side note: I see that he wrote archived at the top and that there are no 3.5 wheels, ASAIK this is/was the only source for regular windows wheels...)

I have configured pipelines for building binary wheels at https://gitlab.com/mentaljam/gdal-wheels. Wheels are published to GitLab's Python package registry. Readme contains some usage examples.

I have configured pipelines for building binary wheels at https://gitlab.com/mentaljam/gdal-wheels. Wheels are published to GitLab's Python package registry. Readme contains some usage examples.

Do you have more details on how you setup the runners? I see that its referenced by self-hosted-linux and self-hosted-windows tags.

@DruidNx, first I tried GitLab's shared Windows runner. However, it has a fixed timeout of 2 hours, which is not enough to compile all the dependencies. So, I had to configure a self-hosted shell runner on my PC. Later I configured the building of manylinux wheels and set up a self-hosted Docker runner right away. To be honest, I did not try a shared Linux runner. Maybe it will be sufficient for building GDAL.

UPD: Two hours may sound like a long time, but shared runners are limited in CPU resources, which dramatically increases compilation times

Thanks, I was able to get the wheels, but its missing the binary tools like ogr2ogr, ogrinfo, etc.
Issue: https://gitlab.com/mentaljam/gdal-wheels/-/issues/1