KG build failing in transform step due to numpy issues
Closed this issue · 3 comments
caufieldjh commented
As of the most recent build (121), the KG build fails at the beginning of the transform step (long stack trace follows):
12:05:31 + python3.9 run.py transform
12:05:32
12:05:32 A module that was compiled using NumPy 1.x cannot be run in
12:05:32 NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
12:05:32 versions of NumPy, modules must be compiled with NumPy 2.0.
12:05:32 Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
12:05:32
12:05:32 If you are a user of the module, the easiest solution will be to
12:05:32 downgrade to 'numpy<2' or try to upgrade the affected module.
12:05:32 We expect that some modules will need time to support NumPy 2.
12:05:32
12:05:32 Traceback (most recent call last): File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/run.py", line 5, in <module>
12:05:32 from kg_phenio import download as kg_download
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/__init__.py", line 3, in <module>
12:05:32 from .transform import transform
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform.py", line 5, in <module>
12:05:32 from kg_phenio.transform_utils.phenio.phenio_transform import PhenioTransform
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/__init__.py", line 2, in <module>
12:05:32 from .phenio_transform import PhenioTransform
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/phenio_transform.py", line 8, in <module>
12:05:32 from kgx.cli.cli_utils import transform # type: ignore
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/__init__.py", line 7, in <module>
12:05:32 from kgx.cli.cli_utils import (
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/cli_utils.py", line 11, in <module>
12:05:32 from kgx.validator import Validator
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/validator.py", line 14, in <module>
12:05:32 from kgx.utils.kgx_utils import (
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/utils/kgx_utils.py", line 22, in <module>
12:05:32 import pandas as pd
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/__init__.py", line 26, in <module>
12:05:32 from pandas.compat import (
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/compat/__init__.py", line 27, in <module>
12:05:32 from pandas.compat.pyarrow import (
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/compat/pyarrow.py", line 8, in <module>
12:05:32 import pyarrow as pa
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pyarrow/__init__.py", line 65, in <module>
12:05:32 import pyarrow.lib as _lib
12:05:32 AttributeError: _ARRAY_API not found
12:05:32
12:05:32 A module that was compiled using NumPy 1.x cannot be run in
12:05:32 NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
12:05:32 versions of NumPy, modules must be compiled with NumPy 2.0.
12:05:32 Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
12:05:32
12:05:32 If you are a user of the module, the easiest solution will be to
12:05:32 downgrade to 'numpy<2' or try to upgrade the affected module.
12:05:32 We expect that some modules will need time to support NumPy 2.
12:05:32
12:05:32 Traceback (most recent call last): File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/run.py", line 5, in <module>
12:05:32 from kg_phenio import download as kg_download
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/__init__.py", line 3, in <module>
12:05:32 from .transform import transform
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform.py", line 5, in <module>
12:05:32 from kg_phenio.transform_utils.phenio.phenio_transform import PhenioTransform
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/__init__.py", line 2, in <module>
12:05:32 from .phenio_transform import PhenioTransform
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/phenio_transform.py", line 8, in <module>
12:05:32 from kgx.cli.cli_utils import transform # type: ignore
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/__init__.py", line 7, in <module>
12:05:32 from kgx.cli.cli_utils import (
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/cli_utils.py", line 11, in <module>
12:05:32 from kgx.validator import Validator
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/validator.py", line 14, in <module>
12:05:32 from kgx.utils.kgx_utils import (
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/utils/kgx_utils.py", line 22, in <module>
12:05:32 import pandas as pd
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/__init__.py", line 49, in <module>
12:05:32 from pandas.core.api import (
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/core/api.py", line 9, in <module>
12:05:32 from pandas.core.dtypes.dtypes import (
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pandas/core/dtypes/dtypes.py", line 24, in <module>
12:05:32 from pandas._libs import (
12:05:32 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pyarrow/__init__.py", line 65, in <module>
12:05:32 import pyarrow.lib as _lib
12:05:32 AttributeError: _ARRAY_API not found
12:05:33
12:05:33 A module that was compiled using NumPy 1.x cannot be run in
12:05:33 NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
12:05:33 versions of NumPy, modules must be compiled with NumPy 2.0.
12:05:33 Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
12:05:33
12:05:33 If you are a user of the module, the easiest solution will be to
12:05:33 downgrade to 'numpy<2' or try to upgrade the affected module.
12:05:33 We expect that some modules will need time to support NumPy 2.
12:05:33
12:05:33 Traceback (most recent call last): File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/run.py", line 5, in <module>
12:05:33 from kg_phenio import download as kg_download
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/__init__.py", line 3, in <module>
12:05:33 from .transform import transform
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform.py", line 5, in <module>
12:05:33 from kg_phenio.transform_utils.phenio.phenio_transform import PhenioTransform
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/__init__.py", line 2, in <module>
12:05:33 from .phenio_transform import PhenioTransform
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/phenio_transform.py", line 8, in <module>
12:05:33 from kgx.cli.cli_utils import transform # type: ignore
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/__init__.py", line 7, in <module>
12:05:33 from kgx.cli.cli_utils import (
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/cli_utils.py", line 12, in <module>
12:05:33 from kgx.sink import Sink
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/sink/__init__.py", line 7, in <module>
12:05:33 from .parquet_sink import ParquetSink
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/sink/parquet_sink.py", line 7, in <module>
12:05:33 from pyarrow import Table
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pyarrow/__init__.py", line 65, in <module>
12:05:33 import pyarrow.lib as _lib
12:05:33 AttributeError: _ARRAY_API not found
12:05:33 Traceback (most recent call last):
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/run.py", line 5, in <module>
12:05:33 from kg_phenio import download as kg_download
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/__init__.py", line 3, in <module>
12:05:33 from .transform import transform
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform.py", line 5, in <module>
12:05:33 from kg_phenio.transform_utils.phenio.phenio_transform import PhenioTransform
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/__init__.py", line 2, in <module>
12:05:33 from .phenio_transform import PhenioTransform
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/kg_phenio/transform_utils/phenio/phenio_transform.py", line 8, in <module>
12:05:33 from kgx.cli.cli_utils import transform # type: ignore
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/__init__.py", line 7, in <module>
12:05:33 from kgx.cli.cli_utils import (
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/cli/cli_utils.py", line 12, in <module>
12:05:33 from kgx.sink import Sink
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/sink/__init__.py", line 7, in <module>
12:05:33 from .parquet_sink import ParquetSink
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/kgx/sink/parquet_sink.py", line 7, in <module>
12:05:33 from pyarrow import Table
12:05:33 File "/var/lib/jenkins/workspace/ledge-graph-hub_kg-phenio_master/gitrepo/venv/lib/python3.9/site-packages/pyarrow/__init__.py", line 65, in <module>
12:05:33 import pyarrow.lib as _lib
12:05:33 File "pyarrow/lib.pyx", line 36, in init pyarrow.lib
12:05:33 ImportError: numpy.core.multiarray failed to import
Numpy is an upstream dependency here but some dependencies already have it pinned to <2.
So the easiest option would also be to pin numpy<2.
caufieldjh commented
This issue did not appear with the Oct 8 build but it reappeared on the Nov 1 build. Strange!
matentzn commented
I struggled a lot with this over last week (see my post in #codestyle
where I was desperately trying to ensure pandas 1.x compatibility with sssom); but I think since you are using python 3.9 these might only be tangentially related.
caufieldjh commented
I fixed it by pinning numpy<2 for now - obviously that won't work forever