gridap/GridapP4est.jl

Function `assemble_vector` crashes with `LinearizedFESpace` when running on multiple processors

wei3li opened this issue · 3 comments

wei3li commented

Hi @amartinhuertas,

The following code crashes when running on more than 2 processors:

using Gridap, GridapDistributed, GridapP4est, PartitionedArrays


with_mpi() do distribute
  order = 2
  parts = (1, 2)
  ranks = distribute(LinearIndices((prod(parts),)))

  coarse_model = CartesianDiscreteModel((0, 1, 0, 1), (1, 1))
  model = OctreeDistributedDiscreteModel(ranks, coarse_model, 3)

  reffe = ReferenceFE(lagrangian, Float64, order)
  degree = 2 * order + 1

  space_args = Dict(:conformity => :H1, :dirichlet_tags => "boundary")
  U = TrialFESpace(FESpace(model, reffe; space_args...), 1.0)
  V, lin_model = Gridap.LinearizedFESpace(model, reffe; space_args...)
  Ω = Triangulation(lin_model)
  dΩ = Measure(Ω, (degree + order - 1) ÷ order)

  a(u, v) = ((v)  (u))dΩ
  assemble_vector(a(get_fe_basis(U), zero(V)), U)
end

The error traces are:

┌ Error:
│   exception =
│    AssertionError: length(cell_vals) == length(cell_rows)
│    Stacktrace:
│      [1] _numeric_loop_vector!(...)
│        @ Gridap.FESpaces ~/Programs/.julia/packages/Gridap/pKtni/src/FESpaces/SparseMatrixAssemblers.jl:282
│      [2] numeric_loop_vector!(...)
│        @ Gridap.FESpaces ~/Programs/.julia/packages/Gridap/pKtni/src/FESpaces/SparseMatrixAssemblers.jl:274
│      [3] map(...)
│        @ PartitionedArrays ~/Programs/.julia/packages/PartitionedArrays/py6uo/src/mpi_array.jl:237
│      [4] numeric_loop_vector!(...)
│        @ GridapDistributed ~/Programs/.julia/packages/GridapDistributed/a9WXb/src/FESpaces.jl:609
│      [5] assemble_vector(...)
│        @ Gridap.FESpaces ~/Programs/.julia/packages/Gridap/pKtni/src/FESpaces/SparseMatrixAssemblers.jl:43
│      [6] assemble_vector(...)
│        @ Gridap.FESpaces ~/Programs/.julia/packages/Gridap/pKtni/src/FESpaces/Assemblers.jl:325
│      [7] assemble_vector(...)
│        @ Gridap.FESpaces ~/Programs/.julia/packages/Gridap/pKtni/src/FESpaces/Assemblers.jl:352
│      [8] (::var"#1#2")(distribute::PartitionedArrays.var"#88#89"{MPI.Comm, Bool})
│        @ Main ~/Documents/mpi-examples/assemble_vector_issue.jl:22
│      [9] with_mpi(f::var"#1#2"; comm::MPI.Comm, duplicate_comm::Bool)
│        @ PartitionedArrays ~/Programs/.julia/packages/PartitionedArrays/py6uo/src/mpi_array.jl:73
│     [10] with_mpi(f::Function)
│        @ PartitionedArrays ~/Programs/.julia/packages/PartitionedArrays/py6uo/src/mpi_array.jl:65
│     [11] top-level scope
│        @ ~/Documents/mpi-examples/assemble_vector_issue.jl:4
│     [12] include(mod::Module, _path::String)
│        @ Base ./Base.jl:418
│     [13] exec_options(opts::Base.JLOptions)
│        @ Base ./client.jl:292
│     [14] _start()
│        @ Base ./client.jl:495
└ @ PartitionedArrays ~/Programs/.julia/packages/PartitionedArrays/py6uo/src/mpi_array.jl:75

The number of cells in the linearized model and the original model seems to be different. I think you can either

  1. only work with V, i.e use the fe_basis from V instead of U, or
  2. define U using the lin_model, so that it's defined on the same triangulation as V.

Alternatively, you should define a glue between the two triangulations (which I believe is something that is not automatic right now) that allows you to move from one to the other.

I can reproduce the error in my local environment. Will take a look and come back to you.

@JordiManyer ... this is a very recent development (LinearizedFEspace on top of non-conforming meshes + parallel).

The error has to do with the local arrays with ghosts versus local arrays without ghosts.