Conditional tolerance for intersection consolidation

Contributing guidelines

I understand the contributing guidelines

Documentation

My proposal is not addressed by the documentation or examples

Existing issues

Nothing similar appears in an existing issue

What problem does your feature proposal solve?

Current consolidate_intersections in OSMnx applies a uniform tolerance across a network, which may not suit complex urban models where different areas require different levels of detail. For example, a dense city center often needs a lower tolerance due to more intricate street layouts, while suburban areas might warrant higher tolerances.

What is your proposed solution?

I propose enhancing the consolidate_intersections function to accept a tolerance_dict parameter. This dictionary would map node identifiers to tolerance values, allowing conditional consolidation based on node-specific criteria. If a node does not appear in this dictionary, a default tolerance would be used.

What alternatives have you considered?

One workaround is to consolidate subsets of the network separately with different tolerances and then attempt to merge them. However, this is cumbersome and can lead to complex merging issues, particularly at the boundaries of the subsets.

Additional context

A minimal example of the proposed usage:

import networkx as nx
import osmnx as ox
from shapely.geometry import Polygon

def consolidate_intersections(G, tolerance=10, rebuild_graph=True, dead_ends=False, reconnect_edges=True, tolerance_dict=None):
    """
    The extended description of parameters...

    tolerance_dict : dict
        A dictionary mapping node identifiers to tolerance values. This enables
        applying different consolidation tolerances to different nodes.
    """
    # Rest of the existing function's code...

    if tolerance_dict:
        # New logic to apply tolerances conditionally based on tolerance_dict
        pass  # Placeholder for the new functionality

    # Rest of the existing function's code...

Example usage with the proposed feature:

# Assume G is a projected graph of an urban area
tolerance_dict = {node: 5 if node in inner_city_nodes else 15 for node in G.nodes()}
G_consolidated = ox.consolidate_intersections(G, tolerance_dict=tolerance_dict)

This example outlines how users might apply a tighter tolerance for a set of nodes (e.g., inner_city_nodes) while using a looser tolerance for the rest, making the function more flexible for many applications.

Just tried an implementation, and while doable, it's also quite cumbersome, especially when rebuild_graph=True). I'm curious for alternatives on this problem.

First consolidating and then merging is problematic, because it's difficult to properly connect both networks again after te fact.

One option would be a exclude_nodes attribute to not consolidate all nodes. Then you could first consolidate the nodes that require a higher tolerance, exclude those nodes, and then run the consolidation process again with a lower tolerance for the remaining nodes.

More generally, it would be really nice for OSMnx to allow applying functions only on parts of the network. That could potentially allow for many function to be used (conditionally) on a subset of the network, and would solve use cases like the one presented in #1140.

Just tried an implementation, and while doable, it's also quite cumbersome, especially when rebuild_graph=True). I'm curious for alternatives on this problem.

There is no straightforward way to do this. Anything truly useful with regards to this use case would require a custom coded solution because of the numerous ad hoc decisions and local graph characteristics to meet a specific user's specific analytical needs.

More generally, it would be really nice for OSMnx to allow applying functions only on parts of the network.

The standard workflow for this in the NetworkX ecosystem is to create subgraphs, apply your function, then compose those subgraphs. OSMnx follows this. If your use case cannot be easily done in this way (such as here), it suggests a deeper challenge that requires a custom solution because a generalizable solution is most likely impossible.

I'll leave some sample code here if others come upon this thread and wonder about that subgraph-function-compose workflow:

import networkx as nx
import osmnx as ox
G = ox.graph.graph_from_place("Piedmont, CA, USA", network_type="drive")
Gp = ox.projection.project_graph(G)

# divide graph into subgraphs based on location
nodes = ox.convert.graph_to_gdfs(Gp, edges=False)
mask_west = nodes.geometry.x < nodes.geometry.unary_union.centroid.x + 100
mask_east = nodes.geometry.x > nodes.geometry.unary_union.centroid.x - 100
G_west = ox.truncate.largest_component(Gp.subgraph(nodes[mask_west].index))
G_east = ox.truncate.largest_component(Gp.subgraph(nodes[mask_east].index))

# then run some function differentially
# then nx.compose() the subgraphs back together

Thanks for getting back. I recently thought about this, tried some things and had a nice discussion with @anastassiavybornova.

The basic complication is that some modifications (including consolidation) alter your network. If you network is singular that most often this isn't a problem, some locations of node might change or some attributes might be modified, but the graph will still be connected (since those are saved in the networkx graph itself).

If you have multiple networks you modify differently (like consolidating with multiple radiuses), that might not be the case. Nodes get moved, osmid values get merged to lists, and sometimes other attributes change. This makes merging - both on osmid value or geospatial - difficult.

Put simply: nx.compose() often doesn't work after consolidation of subnetworks. So unfortunately, the solution proposed above doesn't resolve this issue.

If you would handle this in OSMnx itself, you can leave it a single network. If you then move nodes around and change osmids, this won't be a problem, since how they are connected is saved in the (singular) graph.

I think the tolerance_dict that maps node identifiers to tolerance values still would be a good solution to this problem. Another option would be to allow using a node attribute (or column in the node dataframe) to define the tolerance.

Furthermore, I think there's considerable demand for such a function. While in my use case it's a simple geospatial bound, you could also have researchers that want to consolidate different types of intersections differently. Since might be distributed all over the network, such a use case would be even harder to merge.

So, considering:

This is a difficult issue to fix outside OSMnx
There is considerable demand for such an solution

I hope you would be willing to discuss this further.

(CC @martinfleis, @jdmcbr)

also cc'ing @jGaboardi here

Yes I'm open to considering this further if it's useful to the community. If a dict of per-node tolerance values would be helpful for users, would you perhaps put together some example code showing how to implement this in the codebase in an efficient way?

Thanks, and great to hear! Could you reopen this issue?

@anastassiavybornova would you or someone on your team want to give this a go?

FWIW, just to reiterate the challenge: I had tried to implement an "adaptive" or "per-node" tolerance for intersection consolidation a couple of years ago but was consistently stymied by my inability to make it broadly generalizable, computationally efficient, and with a streamlined API. Like I said earlier in this thread:

There is no straightforward way to do this. Anything truly useful with regards to this use case would require a custom coded solution because of the numerous ad hoc decisions and local graph characteristics to meet a specific user's specific analytical needs.

Perhaps if several people are looking at this now, there'll be some wisdom of the crowd in a clever way to push it forward. A few people have inquired about this over the years, but it's consistently been harder to implement than it appears to be on the surface.

Good to know, and thanks for the heads up.

Currently I understand the problem and implementation as follows:

The only thing tuned is the tolerance parameter for each nodes, so that the distance that each node is buffered can be varied.

osmnx/osmnx/simplification.py

Lines 443 to 445 in cf0ced8

    
               tolerance : float 
        
                   nodes are buffered to this distance (in graph's geometry's units) and 
        
                   subsequent overlaps are dissolved into a single node

This basically means that instead of doing a single buffer operation in _merge_nodes_geometric, it needs to do multiple buffers on part of the geodataframe with nodes.

osmnx/osmnx/simplification.py

Lines 512 to 513 in cf0ced8

    
           # buffer nodes GeoSeries then get unary union to merge overlaps 
        
           merged = convert.graph_to_gdfs(G, edges=False)["geometry"].buffer(tolerance).unary_union

Then, it should do the .unary_union as specified, and other operations can continue as is.

Am I missing something so far?

As for the API, we could go two ways:

Define a tolerance_dict argument which has a dict with tolerance levels as keys and list of node identifiers as values (as proposed currently)
Define a tolerance_column argument, which contains the name of a column which contains the tolerance values in the nodes geodataframe.

I'm now nudging towards the second one, because less complex data needs to be passed.

This basically means that instead of doing a single buffer operation in _merge_nodes_geometric, it needs to do multiple buffers on part of the geodataframe with nodes.

This can still be a single buffer call as buffer accepts array as a parameter. If you get a dictionary of tolerances, you just need to sort it in the same way, get values as numpy arrays and replace the constant tolerance with that. The rest might stay the same? I haven't seen the code in detail but I it should work like that imho.

This can still be a single buffer call as buffer accepts array as a parameter.

Didn't know that (and hadn't read the docs yet). Already really useful to have a GeoPandas maintainer, thanks for joining the conversation :).

In the case 1 API (dict), it could look like this:

    gdf_nodes = graph_to_gdfs(G, edges=False)

    # If tolerance_dict is provided, create a pandas Series from it with the
    # node index as the Series index. This Series will have the same length as
    # the gdf_nodes and provide tolerance values for each node. Nodes not in
    # the dict will get the default tolerance.
    if tolerance_dict:
        tolerances = pd.Series(tolerance_dict).reindex(gdf_nodes.index, fill_value=tolerance)
    else:
        # If no tolerance_dict is provided, use the default tolerance for all nodes
        tolerances = pd.Series(tolerance, index=gdf_nodes.index)

    # Buffer using the tolerances Series for variable distances
    merged = gdf_nodes['geometry'].buffer(distance=tolerances).unary_union

Case 2 (column) like this :

    gdf_nodes = graph_to_gdfs(G, edges=False)

    if tolerance_column and tolerance_column in gdf_nodes.columns:
        # Use the values from the tolerance_column as an array for buffering
        buffer_distances = gdf_nodes[tolerance_column].values
    else:
        # Use the default tolerance for all nodes
        buffer_distances = np.full(len(gdf_nodes), fill_value=tolerance)

    # Buffer all nodes in a single operation
    merged = gdf_nodes['geometry'].buffer(distance=buffer_distances).unary_union

I think I would find the case 2 API more reliable, since you can't mess up the node indexing by accident.

Usage example for case two:

# Calculate the street count for each node and assign the corresponding tolerance directly in the node's attributes
for node, count in ox.stats.streets_per_node(G).items():
    # Nodes with street count of 4 or higher get a tolerance of 5, others get 10
    G.nodes[node]['tolerance'] = 5 if count >= 4 else 10

# When calling consolidate_intersections, the modified version of the function will look for the 'tolerance' column in the nodes GeoDataFrame.
G_consolidated = ox.consolidate_intersections(G, tolerance=10, tolerance_column='tolerance')

And of course there are a lot of other things possible here.

I can open a draft PR tomorrow. Curious what everybody thinks!

I'm at the AAG conference so I won't be super responsive this week but I'll take a look when I can.

Thanks, and no worries!

I opened a PR in #1160 so others can already start to test and review.

Closed by #1160

	tolerance : float
	nodes are buffered to this distance (in graph's geometry's units) and
	subsequent overlaps are dissolved into a single node

	# buffer nodes GeoSeries then get unary union to merge overlaps
	merged = convert.graph_to_gdfs(G, edges=False)["geometry"].buffer(tolerance).unary_union