pmneila/PyMaxflow

Remove nodes from graph

Closed this issue · 4 comments

I need a grid-like graph and it would be a lot easier if I could build a grid graph first and then remove a couple of nodes, instead of building the graph node-by-node.

Is it possible to remove a node from an existing graph?

Hi @nsn88,

No, unfortunately the core library of PyMaxflow does not support removing nodes and the associated edges. However, you can just avoid creating edges to those nodes. Given that PyMaxflow omits all the edges created to/from the node id -1, you can substitute the corresponding nodeids by -1. For example:

import maxflow

g = maxflow.GraphFloat()
nodeids = g.add_grid_nodes((4, 5))
nodeids[1, 2] = -1
print(nodeids)
# [[ 0  1  2  3  4]
#  [ 5  6 -1  8  9]
#  [10 11 12 13 14]
#  [15 16 17 18 19]]
g.add_grid_edges(nodeids)

This code generates this graph:
Figure_1

Does that help with your problem?

Given that PyMaxflow omits all the edges created to/from the node id -1, you can substitute the corresponding nodeids by -1.

Oh, that's neat, thanks for the tip. But I guess this won't reduce the memory consumption, which is my main concern.

My graph has different edge capacities, so anyway I have to loop through all the edges to set the capacity, and I can just skip the unwanted edges. But the problem is that my graph has around 4,000,000 nodes (200x200x100 grid) of which 30% are unwanted ones. If I could remove those nodes, memory consumption can be brought down.

Hi,

Got a few comments:

  1. Nodes are very cheap regarding memory (IIRC, just two integers are stored per node). 4 million nodes use around 60Mb of memory, and 30% is just an overhead of 18Mb. Not sure if you are working with important memory limitations, but otherwise this should not be a huge issue.
  2. In any case, you can create the exact number of nodes you need and then distribute them in the grid array, filling the empty spaces with -1. For example, if you only need 3 nodes in a 4x4 grid:
g = maxflow.GraphFloat()
nodeids = g.add_grid_nodes(3)
# Locations of the 3 nodes in the grid (rows, cols)
# You can also use a boolean mask or any other type of NumPy's advanced indexing
locations = [0, 1, 2], [1, 3, 3]
grid = np.full((4, 4), -1)
grid[locations] = nodeids
print(grid)
# [[-1  0 -1 -1]
#  [-1 -1 -1  1]
#  [-1 -1 -1  2]
#  [-1 -1 -1 -1]]
  1. If it's easier for you, you can also create a list of node pairs (source, target), one per edge, and a similar array of the capacities of those edges. Then you can pass that to add_grid_edges with a proper structure to create edges from left to right. You don't need to explicitly put the nodeids in an array with the shape of your grid. This way, you can define advanced graphs without looping through the edges, which is much slower. In any case, it is still necessary to be able to calculate capacities without looping, otherwise nothing is gained by doing this.
g = maxflow.GraphFloat()
nodeids = g.add_grid_nodes(3)
# Connect 0->2, 2->1, 1->0 with capacities 0.5, 1.5 and 2.5.
grid = nodeids[np.array([
    [0, 2],
    [2, 1],
    [1, 0]
])]
capacities = np.array([
    [0.5, 0.0],
    [1.5, 0.0],
    [2.5, 0.0]
])
# Structure to create edges from left to right
structure = np.array([
    [0, 0, 0],
    [0, 0, 1],
    [0, 0, 0]
])
g.add_grid_edges(grid, weights=capacities, structure=structure)

Hope at least one of those options helps.

Awesome! This surely helps. You should consider including the code in comment-2 in the documentation/examples.

Regarding comment-1, my PC has 64 GB memory, so going by your numbers, I probably shouldn't worry about removing unwanted nodes. I'll do a memory profile and see if this is needed.

Thanks for the quick response.