tensorflow/io

RuntimeError: std::bad_cast when use `from_parquet`

fedaeho opened this issue · 1 comments

Hi, I just installed tensorflow-io and try to load parquet.

import pandas as pd
import tensorflow as tf
import tensorflow_io as tfio

pd.DataFrame({'a':[.1,.2], 'b':[.01,.02]}).to_parquet('file.parquet')

df = pd.read_parquet('file.parquet') # works well
ds = tfio.IODataset.from_parquet('file.parquet', columns = ['a','b']) # Error

But It return following error.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[6], line 1
----> 1 ds = tfio.IODataset.from_parquet(TRAIN_LIST[1])

File ~/.pyenv/versions/3.9.15/lib/python3.9/site-packages/tensorflow_io/python/ops/io_dataset.py:275, in IODataset.from_parquet(cls, filename, columns, **kwargs)
    262 """Creates an `IODataset` from a Parquet file.
    263 
    264 Args:
   (...)
    272 
    273 """
    274 with tf.name_scope(kwargs.get("name", "IOFromParquet")):
--> 275     return parquet_dataset_ops.ParquetIODataset(
    276         filename, columns=columns, internal=True
    277     )

File ~/.pyenv/versions/3.9.15/lib/python3.9/site-packages/tensorflow_io/python/ops/parquet_dataset_ops.py:30, in ParquetIODataset.__init__(self, filename, columns, internal)
     28 assert internal
     29 with tf.name_scope("ParquetIODataset"):
---> 30     components, shapes, dtypes = core_ops.io_parquet_readable_info(
     31         filename, shared=filename, container="ParquetIODataset"
     32     )
     34     def component_f(components, column):
     35         component = tf.boolean_mask(
     36             components, tf.math.equal(components, column)
     37         )[0]

File ~/.pyenv/versions/3.9.15/lib/python3.9/site-packages/tensorflow_io/python/ops/__init__.py:88, in LazyLoader.__getattr__(self, attrb)
     87 def __getattr__(self, attrb):
---> 88     return getattr(self._load(), attrb)

File ~/.pyenv/versions/3.9.15/lib/python3.9/site-packages/tensorflow_io/python/ops/__init__.py:84, in LazyLoader._load(self)
     82 def _load(self):
     83     if self._mod is None:
---> 84         self._mod = _load_library(self._library)
     85     return self._mod

File ~/.pyenv/versions/3.9.15/lib/python3.9/site-packages/tensorflow_io/python/ops/__init__.py:64, in _load_library(filename, lib)
     62 for f in filenames:
     63     try:
---> 64         l = load_fn(f)
     65         if l is not None:
     66             return l

File ~/.pyenv/versions/3.9.15/lib/python3.9/site-packages/tensorflow/python/framework/load_library.py:54, in load_op_library(library_filename)
     31 @tf_export('load_op_library')
     32 def load_op_library(library_filename):
     33   """Loads a TensorFlow plugin, containing custom ops and kernels.
     34 
     35   Pass "library_filename" to a platform-specific mechanism for dynamically
   (...)
     52     RuntimeError: when unable to load the library or get the python wrappers.
     53   """
---> 54   lib_handle = py_tf.TF_LoadLibrary(library_filename)
     55   try:
     56     wrappers = _pywrap_python_op_gen.GetPythonWrappers(
     57         py_tf.TF_GetOpList(lib_handle))

RuntimeError: std::bad_cast

Below is the tensorflow & tensorflow-io version that I use. I also tried tensorflow==2.11.0 and tensorflow-io<0.32 but it still return same error. (OS : CentOS 7)

tensorflow==2.12.0
tensorflow-estimator==2.12.0
tensorflow-io==0.32.0
tensorflow-io-gcs-filesystem==0.32.0

I further inspect the issue and found that error rise when load "python3.9/site-packages/tensorflow_io/python/ops/libtensorflow_io.so".
It didn't happen when I try the same code in ubuntu18.04 but it happen when I use CentOS7.

And also, when I re-run the code tfio.IODataset.from_parquet after RuntimeError, kernel seems like freezing.

same error