databricks/koalas

Issues converting series to 'object' dtype

Opened this issue · 1 comments

Is it possible to convert a Koalas series to have an object dtype. I have tried this, but get an error as shown below. Is there a way that this can be done?

>>> import databricks.koalas as ks
>>> ks_ser = ks.Series([1, 2, 3])
>>> ks_ser_obj = ks_ser.astype('object')
>>> assert ks_ser_obj.dtype == 'object'
>>> ks_ser_obj.to_pandas()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/env/lib/python3.8/site-packages/databricks/koalas/series.py", line 1490, in to_pandas
    return self._to_internal_pandas().copy()
  File "/env/lib/python3.8/site-packages/databricks/koalas/series.py", line 6059, in _to_internal_pandas
    return self._kdf._internal.to_pandas_frame[self.name]
  File "/env/lib/python3.8/site-packages/databricks/koalas/utils.py", line 545, in wrapped_lazy_property
    setattr(self, attr_name, fn(self))
  File "/env/lib/python3.8/site-packages/databricks/koalas/internal.py", line 933, in to_pandas_frame
    sdf = self.to_internal_spark_frame
  File "/env/lib/python3.8/site-packages/databricks/koalas/utils.py", line 545, in wrapped_lazy_property
    setattr(self, attr_name, fn(self))
  File "/env/lib/python3.8/site-packages/databricks/koalas/internal.py", line 921, in to_internal_spark_frame
    zip(self.column_labels, self.data_spark_columns, self.data_spark_column_names)
  File "/env/lib/python3.8/site-packages/databricks/koalas/utils.py", line 545, in wrapped_lazy_property
    setattr(self, attr_name, fn(self))
  File "/env/lib/python3.8/site-packages/databricks/koalas/internal.py", line 845, in data_spark_column_names
    return self.spark_frame.select(self.data_spark_columns).columns
  File "/env/lib/python3.8/site-packages/pyspark/sql/dataframe.py", line 1669, in select
    jdf = self._jdf.select(self._jcols(*cols))
  File "/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "/env/lib/python3.8/site-packages/pyspark/sql/utils.py", line 117, in deco
    raise converted from None
pyspark.sql.utils.AnalysisException: cannot resolve '`0`' due to data type mismatch: cannot cast bigint to array<string>;
'Project [cast(0#18L as array<string>) AS __none__#25]
+- Project [__index_level_0__#17L, 0#18L, monotonically_increasing_id() AS __natural_order__#21L]
   +- LogicalRDD [__index_level_0__#17L, 0#18L], false

Yeah, object type is not supported.