JuliaSmoothOptimizers/MUMPS.jl

Segmentation Fault when solving for many rhs

lruthotto opened this issue · 3 comments

I'm getting a segmentation fault, when solving

  1. a small linear system with many (few thousands) rhs
  2. a large linear system with moderate number of rhs

Below is the example. It would be interesting to see if this works for other people.

On a side note, on my machine, inverting the large identity matrix take a lot of time in test number 2.

BTW: I'm using a Mac OS Yosemite (10.10.5) and julia version 0.4.6-pre+30

using Base.Test
using MUMPS
using MPI

root = 0;

# Initialize MPI.
MPI.Init()
comm = MPI.COMM_WORLD

mumps = Mumps{Float64}(mumps_symmetric, default_icntl, default_cntl64);

## test 1) small matrix, large number of rhs -->  fails
 n = 1000; nrhs = 100000 
## test 2) large matrix, moderate number of rhs --> also fails
# n = 139425; nrhs = 130;

A = speye(n)
rhs = randn(n,nrhs)
if MPI.Comm_rank(comm) == root
  associate_matrix(mumps, A);
  associate_rhs(mumps, copy(rhs));
end

factorize(mumps, A);
solve(mumps)
MPI.Barrier(comm)

if MPI.Comm_rank(comm) == root
  x = get_solution(mumps)
end
finalize(mumps);
@test norm(x-A\rhs)/norm(x) < 1e-8
MPI.Finalize()
dpo commented

I guess it's an issue of the amount of memory that's available. Test 1 succeeds for me if I decrease the number of rhs. Allocating 100,000 float64 vectors of size 1000 consumes about 6Gb of memory. The method associate_rhs makes a copy because MUMPS overwrites the rhs with the solution, so we're at 12Gb. If I change this line to x = rhs, which doesn't make a copy, then test 1 runs. I have 16Gb of RAM.

Can you confirm whether it also works for you?

Test 2 also takes quite a bit of time on my machine (OSX 10.9) and eventually fails with

Entering DMUMPS 5.0.1 driver with JOB, N, NZ =  -2      139425         139425
BLAS : Bad memory unallocation! :    8  0x7fff591a1270
BLAS : Bad memory unallocation! :    8  0x7fff591a1270
BLAS : Bad memory unallocation! :    8  0x7fff591a1270
BLAS : Bad memory unallocation! :    8  0x7fff591a1270
BLAS : Bad memory unallocation! :    8  0x7fff591a1270
BLAS : Bad memory unallocation! :    8  0x7fff591a1270

which appears to be related to OpenBLAS.

UPDATE: after upgrading to Julia 0.4.5 (I was on 0.4.3), the problem went away and test 2 runs.

dpo commented

The inplace branch introduces associate_rhs!, which does not make a copy. Let's know if that resolves the issue for you.

dpo commented

I merged #10. Please reopen if issues remain.