pangeo-data/jupyter-earth

Using Julia together with a Python environment setup via CondaPkg.jl that installs the Python package fiona

JordiBolibar opened this issue · 17 comments

I'm encountering a weird error with some Python/Julia compatibility. The error is the following one, which is analyzed in this issue.

Since I don't have admin rights in the Hub, would it be possible to perform the following command to update libstdc++ in the Julia installation? The command would look like: cp usr/lib/x86_64-linux-gnu/libstdc++.so.6 /srv/julia/lib/julia.

Thanks a lot in advance!

This has now been deployed, new user servers started will have this change!

This didn't seem to fix the issue, so here is more information on what is happening so far.

I'm using CondaPkg.jl, a Julia package to manage a Python and conda installation to be used together with Julia. This allows other libraries, such as PythonCall.jl, to be able to call Python from Julia. Moreover, it also provides a reproducible conda environment to be used together with the Julia environment. One of the Python libraries that I'm using is OGGM, which in turn uses both Fiona and Geopandas. The issue seems to be linked to these geospatial Python libraries, and it is possibly related to GDAL as well. This is what I tried so far:

  • Updating the libstdc++ as done above didn't solve the issue.
  • The order in which these Python libraries are call matters. I tried importing Fiona before Geopandas as suggested here, but it didn't solve the issue either.

The current error I'm getting right now is:

ERROR: LoadError: InitError: Python: ImportError: /srv/julia/bin/../lib/julia/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/jovyan/Julia/fork/ODINN/.CondaPkg/env/lib/python3.10/site-packages/fiona/../../../libgdal.so.31)
Python stacktrace:
 [1] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.10/site-packages/fiona/collection.py:11
 [2] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.10/site-packages/fiona/__init__.py:86

I'm quite lost on what to do next. Any ideas? Thanks a lot in advance!

@JordiBolibar do you have a reproducible example of what you want to work that doesn't work?

I think these may be relevant observations that may help you get past the issue:

  1. /srv/julia/bin/ is mentioned, and that is the Julia programming language provided by the Docker image. This is very important to know as another version of Julia may have been installed when installing Julia related packages via conda. Based on experience, I suggest looking to work against that version of julia systematically and avoiding the use of a conda installed julia version.
  2. In @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.10/site-packages/fiona/__init__.py:86 it seems like a Python 3.10 version is involved. This could be okay, but Python environments are tricky. I suspect there can be a mismatch between the default environment in the image that provides Python 3.9, and the one you work with.

For me to help further, it is relevant to be aware of:

  • how the python environment where fiona seems to live is setup
  • how I can reproduce the error end to end, including defining the python environment

My understanding is that you have installed CondaPkg.jl the Julia package, which in turn is used to setup a dedicated conda environment for your Julia project.

Maybe your issue can be resolved as simply by using Python 3.9? I see that the Python project OGGM is officially currently only supporting up to py39 - but often it works with py310 anyhow so maybe that isn't an issue.

image

That's a good point. I reinstalled CondaPkg.jl using Python 3.9. Unfortunately, this doesn't seem to fix the issue, but there's some progress, since the error changed. Now it's scipy complaining:

ERROR: LoadError: InitError: Python: ImportError: /srv/julia/bin/../lib/julia/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/jovyan/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/scipy/fft/_pocketfft/pypocketfft.cpython-39-x86_64-linux-gnu.so)
Python stacktrace:
 [1] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/scipy/fft/_pocketfft/basic.py:6
 [2] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/scipy/fft/_pocketfft/__init__.py:3
 [3] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/scipy/fft/_helper.py:3
 [4] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/scipy/fft/__init__.py:91
 [5] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/scipy/signal/windows/_windows.py:7
 [6] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/scipy/signal/windows/__init__.py:41
 [7] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/scipy/signal/__init__.py:309
 [8] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/oggm/cfg.py:17
 [9] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/oggm/utils/_downloads.py:58
 [10] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/oggm/utils/__init__.py:2
 [11] <module>
   @ ~/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/oggm/__init__.py:30

Since CondaPkg.jl is a julia package, I'm sure it is using the system Julia version. So the problem shouldn't come from there. I think it is an incompatibility between some system libraries from Linux and Python libraries.

ERROR: LoadError: InitError: Python: ImportError: /srv/julia/bin/../lib/julia/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/jovyan/Julia/fork/ODINN/.CondaPkg/env/lib/python3.9/site-packages/scipy/fft/_pocketfft/pypocketfft.cpython-39-x86_64-linux-gnu.so)

Maybe you must declare cpython as a dependency as well? I'm not sure, this is very very very tricky now since you also setup an entirely new python environment which can come with declaration of LD_LIBRARY_PATH etc that references various system resources.

You are using CondaPkg.jl, if you don't do that and instead install things in the base environment, do you run into the same issues then? I'm still not clear on what you have in your CondaPkg.jl provided python environment. It would be great for me to have a small example of code to reproduce what you do locally to arrive at this.

Hi Erik, I've tried adding cpython as a dependency and it didn't fix things. This is pretty hellish to reproduce outside what I'm using now. I could try comparing my Python conda environment vs my Julia conda environment, but I'm unsure on how to do this efficiently.

Some people suggest updating the libstdc++ version. Since I don't have admin rights, would you mind trying some of these commands to see if this fixes it (if you judge this to be safe)? Thanks again for your help!

@JordiBolibar hmmm I looked into the link to "some of these commands" but I don't want to go deeper into a rabbit hole without being able to reproduce the error myself. To install a pinned version of gcc for example, as provided 3 years ago, would perhaps break other things.

At this point if you want help resolving this, work to provide a reproducible example of the error you run into to allow me to explore more freely on how to resolve it.

Hi @consideRatio,

Let's see if you can reproduce it with these:

  1. First, you can use the following .yml file for the required conda environment:
name: oggm_env
channels:
  - oggm
  - conda-forge

dependencies:
  - numpy
  - pandas
  - xarray
  - netcdf4
  - oggm-deps
  - oggm

- pip:
    - mbsandbox = "@https://github.com/OGGM/massbalance-sandbox/archive/refs/heads/master.zip"
  1. Then, you need to add and configure ODINN (the Julia package from our project), by following the instructions in the README: https://github.com/ODINN-SciML/ODINN.jl

  2. And finally you need to run the following script in order to reproduce it.

Let's see if you can reproduce it. Don't hesitate to come back to me (either here or Slack) if you encounter any issues. Thanks again for your help!

A small update on this, I compared my GLIBCXX_3.4. and @facusapienza21's, and they matched. Then I used the same Julia environment than Facu, and I still got the same problem. Therefore, I isolated the problem, which was coming from the conda environment. After using Facu's conda environment I managed to go around this issue. I'm still not sure where it comes from, but it must be due to a particular version of scipy and potentially other libraries. I'll explore this later on once I have more time.

@JordiBolibar nice progress!

I'm not sure I understand you, but is it correct that when you replaced the conda environment declaration file below, with another conda environment declaration file, then you no longer got the error about GLIBCXX_3.4.30?

name: oggm_env
channels:
  - oggm
  - conda-forge

dependencies:
  - numpy
  - pandas
  - xarray
  - netcdf4
  - oggm-deps
  - oggm

- pip:
    - mbsandbox = "@https://github.com/OGGM/massbalance-sandbox/archive/refs/heads/master.zip"

Yes, that's it. I used Facu's full conda environment from a .yml file to replace mine. Somehow I recently updated some Python libraries which no longer were compatible with Julia libs.

Wieeee, nice tracking this down! Working with compatibility between software is hard just with one programming language, python + julia makes it a bit extra hard - nice work!

Is there an action point to take for me at this point?

Indeed! Julia + Python + conda is quite something!

For now I think I'm good. I think it can be solved with the right combination of packages, I will just have to narrow down where it comes from, but for now I'll just use Facu's environment. Ideally it would be nice to have GLIBCXX_3.4.30, since in the hub's Linux system it stops at GLIBCXX_3.4.28. Not sure how tricky or complicated that is to have.

A week ago I tested if there was a newer version of the ubuntu linux apt package called libstdc++6, and there wasn't. And that is what makes GLIBCXX_<new version> be available or not as I understand it.

It could be that a modern version of the package pyzmq came pre-compiled in a way that required a even newer version of GLIBCXX_ than what is provided via libsdtc++6, and that led to the issues.

I think maybe updating to a more modern version of g++ compiler can help, but I'm very uneasy about making changes that may lead to breaking other things - it seems like a possible rabbit hole to stay out of.

Ideally, we would identify what version of pyzmq must be avoided to have issues like this until we have a more modern version of these C++ dependencies in the system. I'll look into this a bit more and then surrender if no progress is made.

@JordiBolibar maybe you could bundle an installation of gcc with the conda environment as well, messy but maybe. gcc is available in conda-forge. If you would use a modern version of gcc maybe it can provide you with GLIBCXX_3.4.30?

I think you need gcc version 12.1.0 or higher based on reading misc comments and seeing GLIBCXX_3.4.30 be associated with g++ 12.

It seems also that g++ 12 isn't available as an apt package for us to install in the system for ubuntu 20.04 - only for new 22.04 version of ubuntu.

I opened pangeo-data/pangeo-docker-images#352 about getting us towards Ubuntu 22.04 - I expect this will take a bit more than a month time though.

@JordiBolibar nevermind - we have now upgraded to 22.04 via an quick update by @yuvipanda. I've verified that we have GLIBCXX_3.4.30 available now!