Knowledge-Graph-Hub/kg-phenio

KG build failing in transform step due to numpy issues

Closed this issue · 3 comments

As of the most recent build (121), the KG build fails at the beginning of the transform step (long stack trace follows):

12:05:31  + python3.9 run.py transform
12:05:32  
12:05:32  A module that was compiled using NumPy 1.x cannot be run in
12:05:32  NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
12:05:32  versions of NumPy, modules must be compiled with NumPy 2.0.
12:05:32  Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
12:05:32  
12:05:32  If you are a user of the module, the easiest solution will be to
12:05:32  downgrade to 'numpy<2' or try to upgrade the affected module.
12:05:32  We expect that some modules will need time to support NumPy 2.
12:05:32  
12:05:32  Traceback (most recent call last):  File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/run.py", line 5, in <module>
12:05:32      from kg_phenio import download as kg_download
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/__init__.py", line 3, in <module>
12:05:32      from .transform import transform
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform.py", line 5, in <module>
12:05:32      from kg_phenio.transform_utils.phenio.phenio_transform import PhenioTransform
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/__init__.py", line 2, in <module>
12:05:32      from .phenio_transform import PhenioTransform
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/phenio_transform.py", line 8, in <module>
12:05:32      from kgx.cli.cli_utils import transform  # type: ignore
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/__init__.py", line 7, in <module>
12:05:32      from kgx.cli.cli_utils import (
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/cli_utils.py", line 11, in <module>
12:05:32      from kgx.validator import Validator
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/validator.py", line 14, in <module>
12:05:32      from kgx.utils.kgx_utils import (
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/utils/kgx_utils.py", line 22, in <module>
12:05:32      import pandas as pd
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/__init__.py", line 26, in <module>
12:05:32      from pandas.compat import (
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/compat/__init__.py", line 27, in <module>
12:05:32      from pandas.compat.pyarrow import (
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/compat/pyarrow.py", line 8, in <module>
12:05:32      import pyarrow as pa
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pyarrow/__init__.py", line 65, in <module>
12:05:32      import pyarrow.lib as _lib
12:05:32  AttributeError: _ARRAY_API not found
12:05:32  
12:05:32  A module that was compiled using NumPy 1.x cannot be run in
12:05:32  NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
12:05:32  versions of NumPy, modules must be compiled with NumPy 2.0.
12:05:32  Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
12:05:32  
12:05:32  If you are a user of the module, the easiest solution will be to
12:05:32  downgrade to 'numpy<2' or try to upgrade the affected module.
12:05:32  We expect that some modules will need time to support NumPy 2.
12:05:32  
12:05:32  Traceback (most recent call last):  File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/run.py", line 5, in <module>
12:05:32      from kg_phenio import download as kg_download
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/__init__.py", line 3, in <module>
12:05:32      from .transform import transform
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform.py", line 5, in <module>
12:05:32      from kg_phenio.transform_utils.phenio.phenio_transform import PhenioTransform
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/__init__.py", line 2, in <module>
12:05:32      from .phenio_transform import PhenioTransform
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/phenio_transform.py", line 8, in <module>
12:05:32      from kgx.cli.cli_utils import transform  # type: ignore
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/__init__.py", line 7, in <module>
12:05:32      from kgx.cli.cli_utils import (
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/cli_utils.py", line 11, in <module>
12:05:32      from kgx.validator import Validator
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/validator.py", line 14, in <module>
12:05:32      from kgx.utils.kgx_utils import (
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/utils/kgx_utils.py", line 22, in <module>
12:05:32      import pandas as pd
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/__init__.py", line 49, in <module>
12:05:32      from pandas.core.api import (
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/core/api.py", line 9, in <module>
12:05:32      from pandas.core.dtypes.dtypes import (
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/core/dtypes/dtypes.py", line 24, in <module>
12:05:32      from pandas._libs import (
12:05:32    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pyarrow/__init__.py", line 65, in <module>
12:05:32      import pyarrow.lib as _lib
12:05:32  AttributeError: _ARRAY_API not found
12:05:33  
12:05:33  A module that was compiled using NumPy 1.x cannot be run in
12:05:33  NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
12:05:33  versions of NumPy, modules must be compiled with NumPy 2.0.
12:05:33  Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
12:05:33  
12:05:33  If you are a user of the module, the easiest solution will be to
12:05:33  downgrade to 'numpy<2' or try to upgrade the affected module.
12:05:33  We expect that some modules will need time to support NumPy 2.
12:05:33  
12:05:33  Traceback (most recent call last):  File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/run.py", line 5, in <module>
12:05:33      from kg_phenio import download as kg_download
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/__init__.py", line 3, in <module>
12:05:33      from .transform import transform
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform.py", line 5, in <module>
12:05:33      from kg_phenio.transform_utils.phenio.phenio_transform import PhenioTransform
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/__init__.py", line 2, in <module>
12:05:33      from .phenio_transform import PhenioTransform
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/phenio_transform.py", line 8, in <module>
12:05:33      from kgx.cli.cli_utils import transform  # type: ignore
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/__init__.py", line 7, in <module>
12:05:33      from kgx.cli.cli_utils import (
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/cli_utils.py", line 12, in <module>
12:05:33      from kgx.sink import Sink
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/sink/__init__.py", line 7, in <module>
12:05:33      from .parquet_sink import ParquetSink
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/sink/parquet_sink.py", line 7, in <module>
12:05:33      from pyarrow import Table
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pyarrow/__init__.py", line 65, in <module>
12:05:33      import pyarrow.lib as _lib
12:05:33  AttributeError: _ARRAY_API not found
12:05:33  Traceback (most recent call last):
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/run.py", line 5, in <module>
12:05:33      from kg_phenio import download as kg_download
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/__init__.py", line 3, in <module>
12:05:33      from .transform import transform
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform.py", line 5, in <module>
12:05:33      from kg_phenio.transform_utils.phenio.phenio_transform import PhenioTransform
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/__init__.py", line 2, in <module>
12:05:33      from .phenio_transform import PhenioTransform
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/phenio_transform.py", line 8, in <module>
12:05:33      from kgx.cli.cli_utils import transform  # type: ignore
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/__init__.py", line 7, in <module>
12:05:33      from kgx.cli.cli_utils import (
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/cli_utils.py", line 12, in <module>
12:05:33      from kgx.sink import Sink
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/sink/__init__.py", line 7, in <module>
12:05:33      from .parquet_sink import ParquetSink
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/sink/parquet_sink.py", line 7, in <module>
12:05:33      from pyarrow import Table
12:05:33    File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pyarrow/__init__.py", line 65, in <module>
12:05:33      import pyarrow.lib as _lib
12:05:33    File "pyarrow/lib.pyx", line 36, in init pyarrow.lib
12:05:33  ImportError: numpy.core.multiarray failed to import

Numpy is an upstream dependency here but some dependencies already have it pinned to <2.
So the easiest option would also be to pin numpy<2.

This issue did not appear with the Oct 8 build but it reappeared on the Nov 1 build. Strange!

I struggled a lot with this over last week (see my post in #codestyle where I was desperately trying to ensure pandas 1.x compatibility with sssom); but I think since you are using python 3.9 these might only be tangentially related.

I fixed it by pinning numpy<2 for now - obviously that won't work forever