Segmentation Fault when solving for many rhs
lruthotto opened this issue · 3 comments
I'm getting a segmentation fault, when solving
- a small linear system with many (few thousands) rhs
- a large linear system with moderate number of rhs
Below is the example. It would be interesting to see if this works for other people.
On a side note, on my machine, inverting the large identity matrix take a lot of time in test number 2.
BTW: I'm using a Mac OS Yosemite (10.10.5) and julia version 0.4.6-pre+30
using Base.Test
using MUMPS
using MPI
root = 0;
# Initialize MPI.
MPI.Init()
comm = MPI.COMM_WORLD
mumps = Mumps{Float64}(mumps_symmetric, default_icntl, default_cntl64);
## test 1) small matrix, large number of rhs --> fails
n = 1000; nrhs = 100000
## test 2) large matrix, moderate number of rhs --> also fails
# n = 139425; nrhs = 130;
A = speye(n)
rhs = randn(n,nrhs)
if MPI.Comm_rank(comm) == root
associate_matrix(mumps, A);
associate_rhs(mumps, copy(rhs));
end
factorize(mumps, A);
solve(mumps)
MPI.Barrier(comm)
if MPI.Comm_rank(comm) == root
x = get_solution(mumps)
end
finalize(mumps);
@test norm(x-A\rhs)/norm(x) < 1e-8
MPI.Finalize()
I guess it's an issue of the amount of memory that's available. Test 1 succeeds for me if I decrease the number of rhs. Allocating 100,000 float64 vectors of size 1000 consumes about 6Gb of memory. The method associate_rhs
makes a copy because MUMPS overwrites the rhs with the solution, so we're at 12Gb. If I change this line to x = rhs
, which doesn't make a copy, then test 1 runs. I have 16Gb of RAM.
Can you confirm whether it also works for you?
Test 2 also takes quite a bit of time on my machine (OSX 10.9) and eventually fails with
Entering DMUMPS 5.0.1 driver with JOB, N, NZ = -2 139425 139425
BLAS : Bad memory unallocation! : 8 0x7fff591a1270
BLAS : Bad memory unallocation! : 8 0x7fff591a1270
BLAS : Bad memory unallocation! : 8 0x7fff591a1270
BLAS : Bad memory unallocation! : 8 0x7fff591a1270
BLAS : Bad memory unallocation! : 8 0x7fff591a1270
BLAS : Bad memory unallocation! : 8 0x7fff591a1270
which appears to be related to OpenBLAS.
UPDATE: after upgrading to Julia 0.4.5 (I was on 0.4.3), the problem went away and test 2 runs.
The inplace
branch introduces associate_rhs!
, which does not make a copy. Let's know if that resolves the issue for you.