Datashader and polygons: Cannot interpret MultiPolygonDtype(float64) as data type
johannesnauta opened this issue · 3 comments
Description of expected behavior and the observed behavior
I get errors following the exact code outlined in the the tutorial on working with polygons in Datashader.
More specifically the error boils down to a specific polygon data type:
TypeError: Cannot interpret 'MultiPolygonDtype(float64)' as a data type
The only related issue I found was a spatialpandas Github issue, which indicated that a solution would be to downgrade to older versions of geopandas
. However, this did nothing on my systems and seeing that the issue was raised more than 2 years ago I do not think downgrading multiple versions would be useful by any means.
I have also tried converting the datatype to other datatypes that Datashader might understand, but have yet to succeed in producing anything.
How come that the example given by the Datashader developer does not work on my end due to some typing error? Is this an underlying issue with different versions of the used libraries?
Complete, minimal, self-contained example code that reproduces the issue
import pandas as pd
import numpy as np
import dask.dataframe as dd
import colorcet as cc
import datashader as ds
import datashader.transfer_functions as tf
import spatialpandas as spd
import spatialpandas.geometry
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world.to_crs(epsg=4087) # simple cylindrical projection
world['boundary'] = world.geometry.boundary
world['centroid'] = world.geometry.centroid
# Convert the geopandas GeoDataFrame to spatialpandas GeoDataFrame for Datashader to use
df_world = spd.GeoDataFrame(world)
cvs = ds.Canvas(plot_width=650, plot_height=400)
agg = cvs.polygons(df_world, geometry='geometry', agg=ds.mean('pop_est'))
tf.shade(agg)
Output with error message
Output
TypeError Traceback (most recent call last)
Cell In [60], line 19
16 # Convert the geopandas GeoDataFrame to spatialpandas GeoDataFrame for Datashader to use
17 df_world = spd.GeoDataFrame(world)
---> 19 tf.shade(cvs.polygons(df_world, geometry='geometry', agg=ds.mean('pop_est')))
20 cvs = ds.Canvas(plot_width=650, plot_height=400)
21 agg = cvs.polygons(df_world, geometry='geometry', agg=ds.mean('pop_est'))
File ~/.local/lib/python3.10/site-packages/datashader/core.py:752, in Canvas.polygons(self, source, geometry, agg)
750 agg = any_rdn()
751 glyph = PolygonGeom(geometry)
--> 752 return bypixel(source, self, glyph, agg)
File ~/.local/lib/python3.10/site-packages/datashader/core.py:1265, in bypixel(source, canvas, glyph, agg)
1263 if len(cols_to_keep) < len(source.columns):
1264 source = source[cols_to_keep]
-> 1265 dshape = dshape_from_pandas(source)
1266 elif isinstance(source, dd.DataFrame):
1267 dshape = dshape_from_dask(source)
File ~/.local/lib/python3.10/site-packages/datashader/utils.py:442, in dshape_from_pandas(df)
440 def dshape_from_pandas(df):
441 """Return a datashape.DataShape object given a pandas dataframe."""
--> 442 return len(df) * datashape.Record([(k, dshape_from_pandas_helper(df[k]))
443 for k in df.columns])
File ~/.local/lib/python3.10/site-packages/datashader/utils.py:442, in <listcomp>(.0)
440 def dshape_from_pandas(df):
441 """Return a datashape.DataShape object given a pandas dataframe."""
--> 442 return len(df) * datashape.Record([(k, dshape_from_pandas_helper(df[k]))
443 for k in df.columns])
File ~/.local/lib/python3.10/site-packages/datashader/utils.py:433, in dshape_from_pandas_helper(col)
431 elif isinstance(col.dtype, (RaggedDtype, GeometryDtype)):
432 return col.dtype
--> 433 dshape = datashape.CType.from_numpy_dtype(col.dtype)
434 dshape = datashape.string if dshape == datashape.object_ else dshape
435 if dshape in (datashape.string, datashape.datetime_):
File ~/.local/lib/python3.10/site-packages/datashape/coretypes.py:779, in CType.from_numpy_dtype(self, dt)
777 except KeyError:
778 pass
--> 779 if np.issubdtype(dt, np.datetime64):
780 unit, _ = np.datetime_data(dt)
781 defaults = {'D': date_, 'Y': date_, 'M': date_, 'W': date_}
File /usr/lib/python3/dist-packages/numpy/core/numerictypes.py:418, in issubdtype(arg1, arg2)
360 r"""
361 Returns True if first argument is a typecode lower/equal in type hierarchy.
362
(...)
415
416 """
417 if not issubclass_(arg1, generic):
--> 418 arg1 = dtype(arg1).type
419 if not issubclass_(arg2, generic):
420 arg2 = dtype(arg2).type
TypeError: Cannot interpret 'MultiPolygonDtype(float64)' as a data type
}
ALL software version info
pandas=1.4.4
numpy=1.21.5
colorcet=3.0.1
datashader=0.14.2
spatialpandas=0.4.4
geopandas=0.11.1
I cannot reproduce this is a new conda environment using the same versions of the libraries that you have (excluding colorcet
which isn't used in the reproducer):
$ conda create -n temp
$ conda activate temp
$ conda install -c pyviz -c conda-forge pandas==1.4.4 numpy==1.21.5 datashader===0.14.2 spatialpandas==0.4.4 geopandas==0.11.1
$ conda list | grep "pandas\|datashader\|numpy\|datashape"
datashader 0.14.2 py_0 pyviz
datashape 0.5.4 py_1 conda-forge
geopandas 0.11.1 pyhd8ed1ab_0 conda-forge
geopandas-base 0.11.1 pyha770c72_0 conda-forge
numpy 1.21.5 py39h42add53_3
numpy-base 1.21.5 py39hadd41eb_3
pandas 1.4.4 py39he7125aa_0 conda-forge
spatialpandas 0.4.4 py_0 pyviz
Can you check what version of datashape
you have installed, although it hasn't changed for many years and should be 0.5.4
?
I see that you are using your system's python
and numpy
and are using pip install --user
for other packages into ~/.local
. I would be happier if you were using an isolated environment by either using conda
, or pip
into a virtual environment, as then we could be more sure that the packages are consistent.
I reinstalled everything in a new virtual environment and somehow it appears to work. As you indeed mentioned my Jupyter notebook probably did not use all libraries from my previously created virtual environment (with python -m venv
). I always aim to run my code in virtual environments, but somehow this got messed up in my Jupyter notebook as it did not default to the correct kernel when I restarted it at some point. I apologize for this.
Interestingly, when I reverted back to the versions mentioned in the spatialpandas Github issue the example did work, even when my virtual environment was all jumbled up.
Perhaps still relevant, the version of datashape
that is installed in my virtual environment is 0.5.2
, is it worth upgrading at least?
Thanks for trying out a new virtual environment and reporting back.
If you have datashape 0.5.2
then another dependency must have requested that particular version. If everything looks like it is working OK then I would be inclined to leave it as it is.