rapidsai/cloud-ml-examples

HPO studio notebook fails when importing cuDF

Opened this issue · 4 comments

hcho3 commented

https://github.com/rapidsai/cloud-ml-examples/blob/main/aws/rapids_studio_hpo.ipynb

Traceback (most recent call last):
  File "train.py", line 76, in <module>
    train()
  File "train.py", line 27, in train
    ml_workflow = create_workflow(hpo_config)
  File "/opt/ml/code/MLWorkflow.py", line 37, in create_workflow
    from workflows.MLWorkflowSingleGPU import MLWorkflowSingleGPU
  File "/opt/ml/code/workflows/MLWorkflowSingleGPU.py", line 21, in <module>
    import cudf
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/__init__.py", line 71, in <module>
    from cudf.io import (
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/io/__init__.py", line 8, in <module>
    from cudf.io.orc import read_orc, read_orc_metadata, to_orc
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/io/orc.py", line 14, in <module>
    from cudf.utils.metadata import (  # type: ignore
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/utils/metadata/orc_column_statistics_pb2.py", line 7, in <module>
    from google.protobuf.internal import builder as _builder
ImportError: cannot import name 'builder' from 'google.protobuf.internal' (/opt/conda/envs/rapids/lib/python3.8/site-packages/google/protobuf/internal/__init__.py)

This is likely due to a mismatch in Protobuf versions.

The title says parquet but the traceback shows cudf/io/orc.py which suggests the data is ORC.

hcho3 commented

Actually the error occurs when import cudf is executed (before read_parquet is called). When you import cuDF, you also import cudf.io.orc and that import fails.

That makes sense, do you think it would be good to open an issue on cudf about this if it is the import that fails?

hcho3 commented

I think this issue is specific to the particular environment. Let me verify.