GeospatialPython/pyshp

Inconsistent bounding box filtering

lgolston opened this issue · 1 comments

PyShp Version

2.3.1

Python Version

3.11

Your code

import shapefile

fname = "dataverse_files/v6_1820_pref_pts_gbk.shp"
bbox = [19166000, 3461000, 20297000, 3596000]

sf = shapefile.Reader(fname, encoding='gbk')
print(len(list(sf.iterShapeRecords(bbox=bbox))))

Expected results

10

Actual results

301

Other notes

As part of a PR over at cartopy (SciTools/cartopy#2236) I was comparing the output from bounding box spatial filtering with PyShp and fiona for several shapefiles and found some differences. For this shapefile (which just contains Points), PyShp does not seem to filter at all. I think this is a bug? The data used is at https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/2K4FHX/V0WEFJ&version=1.0

Okay, the pull request created hopefully addresses the problem with the shapefile mentioned above involving points.

Another case I tried is the 50m land and glaciated areas Natural Earth shapefiles, which both worked as expected.

However, at 10m resolution the Natural Earth land shapefile switches from containing many Polygon shapes to a small number of MultiPolygon, and the bounding box filtering appears to break down again. For instance, selecting a bbox over the ocean ([-25, -25, -5, -5]) for the land shapefile (https://www.naturalearthdata.com/downloads/10m-physical-vectors/, using v5.1.1), fiona returns only 2 records while pyshp returns 10 (out of the 11 total records). I suspect this is because fiona looks at individual polygons, while PyShp only considers the overall bounding box of the MultiPolygon (which has shapeType 5 as for Polygon):

pyshp/shapefile.py

Lines 1318 to 1325 in d3b6e2a

elif shapeType in (3,5,8,13,15,18,23,25,28,31):
record.bbox = _Array('d', unpack("<4d", f.read(32)))
# if bbox specified and no overlap, skip this shape
if bbox is not None and not bbox_overlap(bbox, record.bbox):
# because we stop parsing this shape, skip to beginning of
# next shape before we return
f.seek(next)
return None