HPO studio notebook fails when importing cuDF
Opened this issue · 4 comments
hcho3 commented
https://github.com/rapidsai/cloud-ml-examples/blob/main/aws/rapids_studio_hpo.ipynb
Traceback (most recent call last):
File "train.py", line 76, in <module>
train()
File "train.py", line 27, in train
ml_workflow = create_workflow(hpo_config)
File "/opt/ml/code/MLWorkflow.py", line 37, in create_workflow
from workflows.MLWorkflowSingleGPU import MLWorkflowSingleGPU
File "/opt/ml/code/workflows/MLWorkflowSingleGPU.py", line 21, in <module>
import cudf
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/__init__.py", line 71, in <module>
from cudf.io import (
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/io/__init__.py", line 8, in <module>
from cudf.io.orc import read_orc, read_orc_metadata, to_orc
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/io/orc.py", line 14, in <module>
from cudf.utils.metadata import ( # type: ignore
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/utils/metadata/orc_column_statistics_pb2.py", line 7, in <module>
from google.protobuf.internal import builder as _builder
ImportError: cannot import name 'builder' from 'google.protobuf.internal' (/opt/conda/envs/rapids/lib/python3.8/site-packages/google/protobuf/internal/__init__.py)
This is likely due to a mismatch in Protobuf versions.
jacobtomlinson commented
The title says parquet but the traceback shows cudf/io/orc.py
which suggests the data is ORC.
hcho3 commented
Actually the error occurs when import cudf
is executed (before read_parquet
is called). When you import cuDF, you also import cudf.io.orc
and that import fails.
jacobtomlinson commented
That makes sense, do you think it would be good to open an issue on cudf about this if it is the import that fails?
hcho3 commented
I think this issue is specific to the particular environment. Let me verify.