pawelswoboda/RAMA

Not getting any cuts for nearly disconnected components

Closed this issue · 3 comments

Hello,

I'm trying multicut for a graph clustering alternative. I'm testing unweighted graphs with a very modular structure (see figure below) using multicut in an SBM-generated graph.

I find that rama returns a partition where it puts very node in a single connected component instead of breaking it out into modules. If I randomize the edge weights, I get an apparently randomized partition.

I understand that if all the costs are equal, there should be no reason to separate edges. However, all the edges that are not present in the edgelist have an implicit negative cost. Is there any way to go around this?

Here's my code for reference:

import numpy as np
import networkx as nx
import rama_py as mc

# helper 
def from_nxgraph_to_tuple(g, data=True):

    src = []
    dst = []
    weights = []
    for (i,j,w) in g.edges(data=data):
        src.append(i)
        dst.append(j)
        weights.append(w['weight'])
    
    return src,dst,weights

# generate graph
g = nx.stochastic_block_model([20,20,20], [ [0.8,0.01,0.01], [0.01,0.8,0.01], [0.01,0.01,0.8] ], directed=True)
nx.set_edge_attributes(g, values = 1.0, name = 'weight')
src,dst,weights = from_nxgraph_to_tuple(g)

# obtain multicut partition
opts = mc.multicut_solver_options("PD")
P = mc.rama_cuda(src,dst,weights, opts)

I then plot the graph g with the obtained partition P, obtaining the following (the node label is the cluster id). I expected a partition of three clusters, each for every almost-disconnected component. I get a partition of the whole graph as a single component
image

If, instead, I randomize the weights (i.e., weights = np.random.rand( len(src) )), I obtain a random partition (node how each node is its own cluster):
image

Are these behaviors expected?

Thanks a lot!

Hi,

Thanks for trying out RAMA. In multicut problem if all edge weights are positive then optimal solution is to set all y variables to 0 (i.e., join the nodes) to minimize the objective. See eq. 2 in the RAMA paper. So, the result of your first case is expected.

For the second case with random weights, I think all weights are still non-negative, no? I tried it and it also puts all nodes in same cluster.

Overall, for every pair of nodes for which you know they should either be joined or separated the costs should be >0 and <0 respectively. Note that nodes not connected by any path in the edge list cannot fall into same component.

Happy to discuss more.
Ahmed

One more thing:

all the edges that are not present in the edgelist have an implicit negative cost.

You would like to impose a negative cost for absent edges? If your instance is small enough, you can just add those edges explicitly with a negative cost.

By the way, there is a slight difference in multicut problem and correlation clustering. In multicut problem the absent edges do not have any preference. Perhaps you want to compute correlation clustering which I think, assumes absent edges have implicit negative cost. One possibility in this direction would be to try out: https://github.com/pxinghao/ParallelCorrelationClustering

closing due to inactivity. Nevertheless, would be happy to discuss more.