mattijn/topojson

to_gdf after topology detection

Closed this issue ยท 4 comments

Hi @mattijn ! Thank you for your work on topojson, it is much appreciated ๐Ÿ‘
I would like to raise the following issue : after computing a topology in which one of the polygons shares arcs with more than one other polygon, the geodataframe that is produced contains a duplicated coordinate for the polygon that is connected to several others. Below the reproducing code :

import numpy as np
from shapely.geometry import Polygon
import geopandas as gpd

poly_0 = Polygon([[0.0, 0.0], [1.0, 0.0], [2.0, 0.0], [2.0, 1.0], [1.0, 1.0], [0.0, 1.0], [0.0, 0.]])
poly_1 = Polygon([[0.0, 1.0], [1.0, 1.0], [1.0, 2.0], [0.0, 2.0], [0.0, 1.]])
poly_2 = Polygon([[1.0, 0.0], [2.0, 0.0], [2.0, -1.0], [1.0, -1.0], [1.0, 0.]])

gdf = gpd.GeoDataFrame({
    "name": ["abc", "def", "ghi"],
    "geometry": [
        poly_0,
        poly_1,
        poly_2
    ]
})
topojs=tj.Topology(gdf, prequantize=False, topology=True)
np.array(topojs.to_gdf().geometry[0].exterior.coords)

returns :
array([[0., 1.], [0., 0.], [1., 0.], [2., 0.], [2., 0.], [2., 1.], [1., 1.], [0., 1.]])

This is not an issue visually, but when using the result for further calculation, it can create problems.

Thanks for trying this package! And thanks for raising the issue! It took me a long lime to realise that the [2., 0.], [2., 0.] are duplicates.. And this indeed should not happen.

A quick scan show that topojs resolves into:

Topology(
{'arcs': [[[1.0, 1.0], [2.0, 1.0], [2.0, 0.0]],
          [[1.0, 0.0], [0.0, 0.0], [0.0, 1.0]],
          [[0.0, 1.0], [0.0, 2.0], [1.0, 2.0], [1.0, 1.0]],
          [[1.0, 1.0], [0.0, 1.0]],
          [[1.0, 0.0], [2.0, 0.0]],
          [[2.0, 0.0], [2.0, -1.0], [1.0, -1.0], [1.0, 0.0]]],
 'bbox': (0.0, -1.0, 2.0, 2.0),
 'coordinates': [],
 'objects': {'data': {'geometries': [{'arcs': [[-4, 0, -5, 1]],
                                      'properties': {'name': 'abc'},
                                      'type': 'Polygon'},
                                     {'arcs': [[2, 3]],
                                      'properties': {'name': 'def'},
                                      'type': 'Polygon'},
                                     {'arcs': [[4, 5]],
                                      'properties': {'name': 'ghi'},
                                      'type': 'Polygon'}],
                      'type': 'GeometryCollection'}},
 'type': 'Topology'}
)

Where this behaviour only occurs on geometry[0] which has its arcs referenced as [[-4, 0, -5, 1]]. So this is:

  • arc 3 reversed
  • arc 0
  • arc 4 reversed
  • arc 1

I cannot see why it behaves wrongly here were the other geometries behave normally. For setting up a specific test I suspect the focus should be on this part: https://github.com/mattijn/topojson/blob/master/topojson/utils.py#L155:L160.

Thanks again for raising!

EDIT:
Example test to add in test_topology.py from where debugging can start once I'm near a developing machine:

def test_topology_geojson_duplicates():
    p0 = wkt.loads('POLYGON ((0 0, 0 1, 1 1, 2 1, 2 0, 1 0, 0 0))')
    p1 = wkt.loads('POLYGON ((0 1, 0 2, 1 2, 1 1, 0 1))')
    p2 = wkt.loads('POLYGON ((1 0, 2 0, 2 -1, 1 -1, 1 0))')
    data = gpd.GeoDataFrame({"name": ["abc", "def", "ghi"], "geometry": [p0, p1, p2]})
    topo = tp.Topology(data, prequantize=False)
    p0_wkt = topo.to_gdf().geometry[0].wkt

    assert p0_wkt == 'POLYGON ((0 1, 0 0, 1 0, 2 0, 2 1, 1 1, 0 1))'

Thank you for prompt response ! Indeed I should have pointed out the duplicate. Sorry for that !

Thanks also for pointing out where the problem comes from.

After some debugging, here is why this fails with this particular geometry :
You do [i > 0 :] at line 157 to remove the first coordinate of an arc if it is not the first one. The problem comes when an arc is reversed while an other arc has a greater number of points. In this situation, because you reverse the arc first, you only remove the nans used as padding in the arcs' numpy array. Also, removing the element before inverting the order is not a solution because you need to remove the first element in the final order.

I ended up writing the following (I didn't manage to keep it a one liner) to replace line 155 to 161:

coord_list = []
for i, arc in enumerate(arcs):
    arc_coords = tp_arcs[arc if arc >= 0 else ~arc][:: arc >= 0 or -1]
    arc_coords = arc_coords[~np.isnan(arc_coords).any(axis=1)]
    coord_list.append(arc_coords[i > 0 :])
coords = np.concatenate(coord_list)

What do you think ?

Yes, thanks for looking into it. What you propose is a good solution. If you can turn it into a Pull Request, that would be great!

Fixed by #112