sbromberger/LightGraphs.jl

betweenness_centrality() runs indefinitely with negative weights

jonasjonker opened this issue · 28 comments

Description of bug
When calling betweenness_centrality() on a large(ish) undirected simple Int64 graph with Float64 weights julia keeps running for a long time unitl a crash due to full RAM.

How to reproduce
Run the code provided with the public data provided in the url. I tried to reproduce it with a small Graph, but that seems to work fine.

Expected behavior
My suspicion is that an infinite loop is occurring. I suspect either an error, when the loop seems to be infinite, or else an result.

Actual behavior
Nothing, julia keeps calculating indefinitely (in jupyter) or until RAM is full and a crash occurs (in atom).

Code demonstrating bug
The data used is publicly available and can be found on Qitta (a microbiome database)

using BenchmarkTools
using FlashWeave
using ParserCombinator
using GraphIO.GML
using LightGraphs
using SimpleWeightedGraphs

ID                  = 1001
ROOT                = "/home/jonas/Repos/Thesis/data/"
data_path           = "$(ROOT)$(ID)/processed_data/1_otu_table.biom"
gml_no_meta_path    = "$(ROOT)$(ID)/network_no_meta.gml"

netw_results_no_meta = FlashWeave.learn_network(data_path,
                                                sensitive     = true,
                                                heterogeneous = false)


save_network(gml_no_meta_path, netw_results_no_meta)
G = graph(netw_results_no_meta) 
# {6558, 8484} undirected simple Int64 graph with Float64 weights

# The unweighted version of the same graph.
g = loadgraph(gml_no_meta_path, GMLFormat()) 
# {6558, 8484} undirected simple Int64 graph

# A toy example.
gw = SimpleWeightedGraph(smallgraph(:house))
# {5, 6} undirected simple Int64 graph with Float64 weights


@time centrality_arr  = betweenness_centrality(g)
# 12.467820 seconds (45.30 M allocations: 7.531 GiB, 2.73% gc time)

@time centrality_arr  = betweenness_centrality(gw)
# 0.271050 seconds (282.41 k allocations: 13.838 MiB)

@time centrality_arr  = betweenness_centrality(G)
# steadily increasing RAM usage until RAM is full and julia crashes.

Version information

julia> VERSION
v"1.5.3"

julia> versioninfo()
Commit 788b2c77c1* (2020-11-09 13:37 UTC)
Platform Info:
 OS: Linux (x86_64-pc-linux-gnu)
 CPU: AMD Ryzen 5 3600 6-Core Processor
 WORD_SIZE: 64
 LIBM: libopenlibm
 LLVM: libLLVM-9.0.1 (ORCJIT, znver2)
Environment:
 JULIA_EDITOR = atom  -a
 JULIA_NUM_THREADS = 6

Pkg> usage LightGraphs
[093fc24a] LightGraphs v1.3.5

Additional context
I originally found the bug when using the package FlashWeave. I first that package caused the bug in some way. But then I realized I was comparing two different versions of the graph I generated using that package (), a weighted and an unweighted one.

That is definitely not a large graph. Let me see if I can reproduce.

Can you provide a link to the graph?

Also, I don't see where you're using Float64 weights. I've done BC on million-node, ten-million-edge graphs before without memory issues, so I'm, not quite sure where the problem is.

How much memory do you have?

Ah, in 1.x BC is using Dijkstra regardless of whether you're passing weights in. This was one of the things I changed for 2.0. I suppose we should backport that change so that BFS is used if weights aren't provided.

I made a random connected graph and tested it. It's not a good reproduction, but it shows that we're not doing anything crazy:

julia> @btime betweenness_centrality(g);
  19.432 s (49225739 allocations: 8.17 GiB)

julia> g
{6558, 8484} undirected simple Int64 graph

The data can be downloaded at this microbiome database. You need to sign up for a free account to access it. So I uploaded the file on google drive (2 mb) as well.

I have 32 GB of RAM + 16 GB swap.

Can you post a GML version (or an .lgz) so I don't have to convert?

Got it. let me do some tests.

I can't reproduce this:

julia> g = loadgraph("network_no_meta.gml", "graph", GML.GMLFormat())
{6558, 8484} undirected simple Int64 graph          

julia> @btime betweenness_centrality(g);            
  18.158 s (44859750 allocations: 7.51 GiB)

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 5 3400G with Radeon Vega Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, znver1)

yes, but when you use loadgraph("network_no_meta.gml", "graph", GML.GMLFormat()) it returns an unweighted graph not a weighted one. Should loadgraph() return a weighted graph when the weights are provided in the .gml file? I checked and the edges contain weights in the .gml file.

The graph returned by FlashWeave is weighted.

Should loadgraph() return a weighted graph when the weights are provided in the .gml file?

No. Right now, GraphIO returns instances of SimpleGraph, which are unweighted.

Is there a way for me to share the weighted graph?

Is there a way for me to share the weighted graph?

You'd have to create a weighted graph first, which you're not doing, since GraphIO produces SimpleGraphs, which don't understand weights.

Even with weights, I can't reproduce this.

julia> w = rand(6558,6558);  # let's make some random weights

julia> @btime betweenness_centrality(g, vertices(g), w);
  34.573 s (39723263 allocations: 7.40 GiB)         

The random graph doesn't cause me any problems either.

What would be the best way for me to inspect the Graph created?

julia> w = rand(6558,6558);  # let's make some random weights

julia> @btime betweenness_centrality(g, vertices(g), w);
  18.441 s (39723563 allocations: 7.41 GiB)

if the randomly-weighted graph doesn't cause any problems for you then I don't understand how its unweighted counterpart is causing issues. If anything, it should be much faster (as is evidenced by comparing the times with and without weights).

What if you did a g = squash(g) before running BC? This will turn the vertices into Int16s which will half the memory requirements.

Oh, wait.

G = graph(netw_results_no_meta) 
# {6558, 8484} undirected simple Int64 graph with Float64 weights

Where is this graph function? It's creating a SimpleWeightedGraph.

Still can't reproduce with a SimpleWeightedGraph:

julia> w = zeros(6558,6558);                                                                                                                          
                                                                                                                                                      
julia> for e in edges(g)                                                                                                                              
           r = rand()                                                                                                                                 
           w[src(e), dst(e)] = r                                                                                                                      
           w[dst(e), src(e)] = r                                                                                                                      
       end

julia> s = SimpleWeightedGraph(g)
{6558, 8484} undirected simple Int64 graph with Float64 weights

julia> s.weights = sparse(w);   # don't ever do this in real life

julia> @btime betweenness_centrality(s);
  24.036 s (39724592 allocations: 7.38 GiB)

graph() is a FlashWeave function, and it does indeed create a SimpleWeightedGraph. This is the graph that then causes me issues.

When I export the network created as a .gml file and then import it, it creates a SimpleGraph which doesn't cause me any trouble.

The fact that a SimpleWeightedGraph generated by LightGraphs doesn't cause me any trouble either makes me suspect that there is some problem with the graph created by FlashWeave...

p.s. I can run your code as well.

w = zeros(6558,6558);                                                                                                                          
                                                                                                                                                      
for e in edges(g)                                                                                                                              
           r = rand()                                                                                                                                 
           w[src(e), dst(e)] = r                                                                                                                      
           w[dst(e), src(e)] = r                                                                                                                      
       end

s = SimpleWeightedGraph(g)
   {6558, 8484} undirected simple Int64 graph with Float64 weights

using SparseArrays

s.weights = sparse(w);   # don't ever do this in real life

@btime betweenness_centrality(s)
   18.264 s (39724212 allocations: 7.36 GiB)

ok, so. Let's take a deeper look at G, the graph that FlashWeave produces. Can you save G.weights somewhere where I can download it?

I uploaded the graph here

using JLD2

@load "g_weights.jld2" G

G.weights # 6558×6558 SparseArrays.SparseMatrixCSC{Float64,Int64} with 16968 stored entries:

@time centrality_arr  = betweenness_centrality(G)  # the grap[h that causes the issue

Ah, I see the problem:

julia> minimum(g.weights)
-1.0

Can't have negative weights with Dijkstra.

Ah. That makes sense... I just started working with graphs, so I didn't think about that. Maybe It should give an error.

We eschew error checking for things that represent "illegal" behavior (particularly when the error checking causes performance impacts). In this case, since betweenness_centrality uses Dijkstra, and Dijkstra hasn't ever allowed negative graph weights, we assume that our users understand the proper inputs.

Fair enough. I'll inform myself better next time... Thanks for the help

No worries! You might ask the folks at FlashWeave whether negative weights are significant. Lots of things in LightGraphs will break on negative weights.

I informed the Developer of FlashWeave about this thread. The point is that in this use case negative weights have biological meaning.

The main issue is that analyzing these graphs comes with the need to properly understand how metrics like betweenness centrality are calculated. But that is more of a thing that the user (me in this case) needs to understand.

If negative weights are important/unavoidable, then your options for doing graph analytics involving anything related to (or using) pathing or path discovery are limited. There's not much we can do about this except for possibly accepting some version of the algorithms (betweenness_centrality in this case) that uses Bellman-Ford, which handles negative weights (at a significant penalty to performance).