StanfordLegion/legion

Realm: replicated heap exhausted when creating many compact instances

MoraruMaxim opened this issue · 8 comments

In map_copy, we would like to create compact instances for the source requirements that have sparse domains. Something like:

if(IS_SRC && !req_domain.dense())
      creation_constraints.add_constraint(Legion::SpecializedConstraint(LEGION_COMPACT_SPECIALIZE));

It works correctly on smaller problem sizes. However, when we try to scale up (i.e. run on 4 nodes with a larger problem size), we obtain the following error :
FATAL: replicated heap exhausted, grow with -ll:replheap - at least 17308992 bytes required

With the following backtrace:

Thread 10 "poisson" received signal SIGABRT, Aborted.
[Switching to Thread 0x154a52eb8e40 (LWP 43000)]
0x0000154a9250bacf in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000154a9250bacf in raise () from /lib64/libc.so.6
#1  0x0000154a924deea5 in abort () from /lib64/libc.so.6
#2  0x0000154a955ea88d in Realm::ReplicatedHeap::alloc_obj(unsigned long, unsigned long) ()
   from .../lib64/librealm.so.1
#3  0x0000154a9558f4d6 in Realm::InstanceLayout<2, long long>::compile_lookup_program(Realm::PieceLookup::CompiledProgram&) const ()
   from .../lib64/librealm.so.1
#4  0x0000154a95572832 in Realm::RegionInstanceImpl::Metadata::deserialize(void const*, unsigned long) ()
   from .../lib64/librealm.so.1
#5  0x0000154a955c3e09 in Realm::MetadataResponseMessage::handle_message(int, Realm::MetadataResponseMessage const&, void const*, unsigned long)
    ()
   from .../lib64/librealm.so.1
#6  0x0000154a9567c143 in Realm::IncomingMessageManager::do_work(Realm::TimeLimit) ()
--Type <RET> for more, q to quit, c to continue without paging--
  nments/cr_16_standard/.spack-env/view/lib64/librealm.so.1
#7  0x0000154a95547fdc in Realm::BackgroundWorkManager::Worker::do_work(long long, Realm::atomic<bool>*) ()
   from .../lib64/librealm.so.1
#8  0x0000154a95548801 in Realm::BackgroundWorkThread::main_loop() ()
   from .../lib64/librealm.so.1
#9  0x0000154a9563036f in Realm::KernelThread::pthread_entry(void*) ()
   from .../lib64/librealm.so.1
#10 0x0000154a949931ca in start_thread () from /lib64/libpthread.so.0
#11 0x0000154a924f6e73 in clone () from /lib64/libc.so.6

The default size for replicated heap I believe is 16777216 bytes. Did you try passing -ll:replheap flag?

Increasing the replheap size helps, but then, when I increase the number of nodes and the problem size (weak scaling) I need an even larger replheap.

It only happens when we use compact instances. Is this the only way to use compact instances at large scale (i.e. keep increasing the size of the replicated heap) ?

Increasing the replheap size helps, but then, when I increase the number of nodes and the problem size (weak scaling) I need an even larger replheap.

It should only grow proportional to the number of compact instances that you make and not their sizes. As long as you are making new compact instances (presumably for nearest neighbors in the mesh) then the amount of space required will continue to grow. However, I'm presuming at some point your application should hit a maximum number of nearest neighbors and then the amount of replicated heap memory that you should need will plateau. My understanding of the upper bound on the number of nearest neighbors in FleCSI applications comes from conversations with @opensdh so please correct me if I'm wrong about that.

By "their sizes", do you mean the count of index points contained or the count of rectangles required to describe them or both? I agree that reasonable usage of most topologies will have a bounded degree in the color-communication graph, such that the number of compact instances relevant to any one point-copy in an index copy launch will be bounded.

By "their sizes", do you mean the count of index points contained or the count of rectangles required to describe them or both?

I mean the total amount of memory required to represent the instance. The amount of memory needed in the replicated heap may be proportional to the number of rectangles in the compact instances but I'm presuming that the number of those rectangles is proportional to the number of nearest neighbors. The amount of memory required in the replicated heap will be independent of the total volume of those rectangles though.

For a structured mesh in FleCSI, the number of rectangles is proportional to the surface (hyper)area of the subdomain in question, because the linearization required for the two-dimensional index-space layout (with coordinates (c,i)) causes every index point on the faces normal to one dimension to be isolated. For scaling of any one problem, this is a constant (weak) or decreasing (strong), but it does scale with problem size.

For an unstructured mesh, yes we could arrange for the number of rectangles to be proportional to the number of nearest neighbors, although I don't think that's been implemented yet (because the first step was making the destinations on one color contiguous).

For scaling of any one problem, this is a constant (weak) or decreasing (strong), but it does scale with problem size.

To be clear you mean this is the number of rectangles required by each (FleCSI) color right?

Is this particular case where the repl heap is being exhausted a structured or unstructured mesh? Also are we weak or strong scaling?

For scaling of any one problem, this is a constant (weak) or decreasing (strong), but it does scale with problem size.

To be clear you mean this is the number of rectangles required by each (FleCSI) color right?

Yes: the rectangle count per color scales with the ((d-1)/d power of the) problem size per color.

Is this particular case where the repl heap is being exhausted a structured or unstructured mesh? Also are we weak or strong scaling?

The original post here is from a structured mesh; a previous comment mentions weak scaling for it.