Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0?
1jiangxd opened this issue · 1 comments
Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0?
(Two shp files have been uploaded to my GitHub repository)
https://github.com/1jiangxd/daskgeopandasproblems
The code I used is as follows, but when checking proceed '201105. shp', only the first 2 million lines were processed, and the remaining other original content changed into 0
May I ask where the problem lies with this code? If anyone can answer, I would greatly appreciate your help
import geopandas as gpd
import time
import dask_geopandas
def process_row(row):
outwen = r'201105.shp'
bianjie = r'2023xian.shp'
jiabianjie = r'E:\201105out'
start_time3 = time.time()
# Read input and clipped boundary shapefiles
target_gdf = gpd.read_file(outwen)
join_gdf = gpd.read_file(bianjie)
# Switch to dask approach
target_gdfnew = dask_geopandas.from_geopandas(target_gdf, npartitions=4)
# Reproject the boundary participating in the join to match the CRS of the target geometry
join_gdf = join_gdf.to_crs(target_gdf.crs)
# Switch to dask approach
join_gdfnew = dask_geopandas.from_geopandas(join_gdf, npartitions=4)
# Use spatial join to find intersecting parts
joined = gpd.sjoin(target_gdfnew, join_gdfnew, how='inner', predicate='intersects')
# Add attributes from 'bianjie' to 'outwen'
joined = joined.drop(columns='index_right') # Remove redundant index column
result = target_gdfnew.merge(joined, how='left', on=target_gdfnew.columns.to_list())
# Save the result to the output boundary
result.to_file(jiabianjie, encoding='utf-8-sig') # Ensure the correct encoding is used
end_time3 = time.time()
execution_time3 = end_time3 - start_time3
print(f"'{jiabianjie}' has added boundaries. Start time: {start_time3:.2f}, End time: {end_time3:.2f}, Execution time: {execution_time3:.2f} seconds")
process_row()
print('Finish')
@1jiangxd apologies for the slow reply, but looking at your code, the following lines
# Add attributes from 'bianjie' to 'outwen'
joined = joined.drop(columns='index_right') # Remove redundant index column
result = target_gdfnew.merge(joined, how='left', on=target_gdfnew.columns.to_list())
are typically not needed. The result of the spatial join, joined
, already has the columns of the original target_gdf
, so this additional merge is not doing anything, except for getting back the original rows of target_gdf
that didn't have a match in the spatial join. To achieve the same, you do a left join (specifying how='left' in the
sjoin` call).
Also, I assume that the gpd.sjoin
in your code above should be dask_geopandas.sjoin
?