'dask_geopandas.from_dask_dataframe' produces error: 'DataFrame' object has no attribute 'map_partitions'
komzy opened this issue · 1 comments
komzy commented
I'm writing a simple code to read a large geojson file (>3 GB) into dask and convert to dask-geopandas dataframe. However I run into the above error.
Here's my code:
import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
import dask_geopandas
import dask.dataframe as dd
dask_df = dd.read_json('madagascar_gen.txt',orient='list').compute()
dgpd = dask_geopandas.from_dask_dataframe(dask_df, geometry="geometry")
Error log:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 dgpd = dask_geopandas.from_dask_dataframe(dask_df, geometry="geometry")
File ~/opt/anaconda3/lib/python3.9/site-packages/dask_geopandas/core.py:790, in from_dask_dataframe(df, geometry)
786 name = geometry.name if geometry.name is not None else "geometry"
787 return df.assign(**{name: geometry}).map_partitions(
788 geopandas.GeoDataFrame, geometry=name
789 )
--> 790 return df.map_partitions(geopandas.GeoDataFrame, geometry=geometry)
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py:5575, in NDFrame.__getattr__(self, name)
5568 if (
5569 name not in self._internal_names_set
5570 and name not in self._metadata
5571 and name not in self._accessors
5572 and self._info_axis._can_hold_identifiers_and_holds_name(name)
5573 ):
5574 return self[name]
-> 5575 return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'map_partitions'
madagascar_gen.json:
[
{"geometry":{"coordinates":[44.3207501,-20.290752],"type":"Point"},"type":"Feature","properties":{"oID":"1","timestamp":"2022-09-02 11:05:44"}},
{"geometry":{"coordinates":[44.32089653504225,-20.290709591647275],"type":"Point"},"type":"Feature","properties":{"oID":"1","timestamp":"2022-09-02 11:05:44"}},
{"geometry":{"coordinates":[44.32104297004467,-20.290667183294346],"type":"Point"},"type":"Feature","properties":{"oID":"1","timestamp":"2022-09-02 11:05:44"}},
...
]
Anyone know why this is happening?
martinfleis commented
You are not passing a dask.dataframe to dask_geopandas.from_dask_dataframe
. When you call compute()
, dask computes the task graph and returns a pandas dataframe. The code above should be like this if you want to read with dask.dataframe:
dask_df = dd.read_json('madagascar_gen.txt',orient='list')
dgpd = dask_geopandas.from_dask_dataframe(dask_df, geometry="geometry")
But given the file is geojson, you will need to create geometry array yourself. The better option would be to read directly with dask-geopandas.
dgpd = dask_geopandas.read_file("madagascar_gen.json", npartitions=4)