nv-legate/legate.core

Error when using --profile with canonical interpreter

manopapad opened this issue · 1 comments

I am getting this error when I try to profile using the canonical interpreter on my Mac.

I don't know if the use of the default mapper is relevant here, but the custom legion_python main is setting its custom mapper for the top-level task.

~/cunumeric> LEGATE_CONFIG="--verbose --profile" python a.py

--- Legion Python Configuration ------------------------------------------------

Legate paths:
  legate_dir       : /Users/mpapadakis/legate.core
  legate_build_dir : /Users/mpapadakis/legate.core/_skbuild/macosx--x86_64-3.9/cmake-build
  bind_sh_path     : /Users/mpapadakis/legate.core/bind.sh
  legate_lib_path  : /Users/mpapadakis/legate.core/build/lib

Legion paths:
  legion_bin_path       : /Users/mpapadakis/legate.core/_skbuild/macosx--x86_64-3.9/cmake-build/_deps/legion-build/bin
  legion_lib_path       : /Users/mpapadakis/legate.core/_skbuild/macosx--x86_64-3.9/cmake-build/_deps/legion-build/lib
  realm_defines_h       : /Users/mpapadakis/legate.core/_skbuild/macosx--x86_64-3.9/cmake-build/_deps/legion-build/runtime/realm_defines.h
  legion_defines_h      : /Users/mpapadakis/legate.core/_skbuild/macosx--x86_64-3.9/cmake-build/_deps/legion-build/runtime/legion_defines.h
  legion_spy_py         : /Users/mpapadakis/legion/tools/legion_spy.py
  legion_prof_py        : /Users/mpapadakis/legion/tools/legion_prof.py
  legion_python         : /Users/mpapadakis/legate.core/_skbuild/macosx--x86_64-3.9/cmake-build/_deps/legion-build/bin/legion_python
  legion_module         : /Users/mpapadakis/legion/bindings/python/build/lib
  legion_jupyter_module : /Users/mpapadakis/legion/jupyter_notebook

Versions:
  legate_version : 23.03.00.dev+38.g02bb2be

Command:
  -lg:local 0 -ll:cpu 4 -ll:util 2 -ll:csize 4000 -ll:networks none -lg:prof 1 -lg:prof_logfile /Users/mpapadakis/cunumeric/legate_%.prof -level openmp=5,legion_prof=2 -lg:eager_alloc_percentage 50 -ucx:tls_host '^dc,ud'

Customized Environment:
  DYLD_LIBRARY_PATH=/Users/mpapadakis/legate.core/_skbuild/macosx--x86_64-3.9/cmake-build/_deps/legion-build/lib:/Users/mpapadakis/legate.core/build/lib
  GASNET_MPI_THREAD=MPI_THREAD_MULTIPLE
  LEGATE_MAX_DIM=4
  LEGATE_MAX_FIELDS=256
  NCCL_LAUNCH_MODE=PARALLEL
  PYTHONDONTWRITEBYTECODE=1
  PYTHONPATH=/Users/mpapadakis/legion/bindings/python/build/lib:/Users/mpapadakis/legion/jupyter_notebook:/Users/mpapadakis/legate.core/legate
  REALM_BACKTRACE=1
  REALM_UCP_BOOTSTRAP_PLUGIN=/Users/mpapadakis/legate.core/_skbuild/macosx--x86_64-3.9/cmake-build/_deps/legion-build/lib/realm_ucp_bootstrap_mpi.so
  UCX_CUDA_COPY_MAX_REG_RATIO=1.0
  UCX_IB_RCACHE_PURGE_ON_FORK=n
  UCX_MULTI_LANE_MAX_RATIO=1.0
  UCX_RC_TX_POLL_ALWAYS=y

--------------------------------------------------------------------------------

[0 - 10f9d9dc0]    0.000225 {4}{threads}: reservation ('CPU proc 1d00000000000003') cannot be satisfied
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! YOU ARE PROFILING IN DEBUG MODE           !!!
!!! SERIOUS PERFORMANCE DEGRADATION WILL OCCUR!!!
!!! COMPILE WITH DEBUG=0 FOR PROFILING        !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

SLEEPING FOR 5 SECONDS SO YOU READ THIS WARNING...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! YOU ARE PROFILING USING THE DEFAULT MAPPER!!!
!!! THE DEFAULT MAPPER IS NOT FOR PERFORMANCE !!!
!!! PLEASE CUSTOMIZE YOUR MAPPER TO YOUR      !!!
!!! APPLICATION AND TO YOUR TARGET MACHINE    !!!
First use of the default mapper in address space 0
occurred when task legion_python_top_level_task (UID 1) invoked the "configure_context" mapper call
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!WARNING WARNING WARNING WARNING WARNING WARNING!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

[0 - 10f9d9dc0]    5.049820 {6}{realm}: invalid processor handle: id=0
Assertion failed: (0 && "invalid processor handle"), function get_processor_impl, file /Users/mpapadakis/legion/runtime/realm/runtime_impl.cc, line 2689.
Signal 6 received by node 0, process 82898 (thread 10f9d9dc0) - obtaining backtrace
Signal 6 received by process 82898 (thread 10f9d9dc0) at: stack trace: 18 frames
  [0] = 0   libsystem_platform.dylib            0x00007fff6ff3a5fd _sigtramp + 29
  [1] = 0   librealm.1.dylib                    0x000000019f7f4555 _ZNSt3__117__compressed_pairIPNS_11__tree_nodeINS_12__value_typeIjN5Realm18LocalTaskProcessor14TaskTableEntryEEEPvEENS_22__tree_node_destructorINS_9allocatorIS8_EEEEE5firstB6v15007Ev + 21
  [2] = 0   libsystem_c.dylib                   0x00007fff6fe10808 abort + 120
  [3] = 0   libsystem_c.dylib                   0x00007fff6fe0fac6 err + 0
  [4] = 0   librealm.1.dylib                    0x000000019f828c0a _ZN5Realm11RuntimeImpl18get_processor_implENS_2IDE + 282
  [5] = 0   librealm.1.dylib                    0x000000019f7d4193 _ZNK5Realm9Processor4kindEv + 83
  [6] = 0   liblegion.1.dylib                   0x00000001993e4eac _ZN6Legion8Internal18LegionProfInstance17process_proc_descERKN5Realm9ProcessorE + 268
  [7] = 0   liblegion.1.dylib                   0x00000001993f42b2 _ZN6Legion8Internal14LegionProfiler18record_mapper_callENS0_15MappingCallKindEyyy + 114
  [8] = 0   liblegion.1.dylib                   0x000000019978fdb7 _ZN6Legion8Internal13MapperManager14free_call_infoEPNS0_15MappingCallInfoE + 135
  [9] = 0   liblegion.1.dylib                   0x00000001997922ae _ZN6Legion8Internal18SerializingManager18finish_mapper_callEPNS0_15MappingCallInfoE + 446
  [10] = 0   liblegion.1.dylib                   0x0000000199784056 _ZN6Legion8Internal13MapperManager24invoke_configure_contextEPNS0_6TaskOpEPNS_7Mapping6Mapper19ContextConfigOutputEPNS0_15MappingCallInfoE + 342
  [11] = 0   liblegion.1.dylib                   0x00000001990b9229 _ZN6Legion8Internal12InnerContext17configure_contextEPNS0_13MapperManagerEi + 89
  [12] = 0   liblegion.1.dylib                   0x0000000199584f41 _ZN6Legion8Internal10SingleTask23create_implicit_contextEv + 353
  [13] = 0   liblegion.1.dylib                   0x0000000199980070 _ZN6Legion8Internal7Runtime19begin_implicit_taskEjjN5Realm9Processor4KindEPKcbjiRKNS_11DomainPointE + 976
  [14] = 0   liblegion.1.dylib                   0x000000019922d924 _ZN6Legion7Runtime19begin_implicit_taskEjjN5Realm9Processor4KindEPKcbjiNS_11DomainPointE + 132
  [15] = 0   liblegion_canonical_python.1.dylib  0x0000000198c45844 legion_canonical_python_begin_top_level_task + 436
  [16] = 0   libffi.8.dylib                      0x000000014fb37d92 ffi_call_unix64 + 82
  [17] = 0   ???                                 0x00000001a4e1a4f0 0x0 + 7061218544

It is because realm does not know about the processor (the main function) that used to run the implicit top level task. I can reproduce the bug with this example https://gitlab.com/StanfordLegion/legion/-/tree/control_replication/examples/implicit_top_task

We can get ride of this error by applying this patch

diff --git a/runtime/legion/legion_profiling.cc b/runtime/legion/legion_profiling.cc
index a6bd744fd..3bf75a90a 100644
--- a/runtime/legion/legion_profiling.cc
+++ b/runtime/legion/legion_profiling.cc
@@ -767,7 +767,11 @@ namespace Legion {
       proc_desc_infos.emplace_back(ProcDesc());
       ProcDesc &info = proc_desc_infos.back();
       info.proc_id = p.id;
-      info.kind = p.kind();
+      if (p.id == 0) {
+        info.kind = Legion::Internal::ProcKind::LOC_PROC;
+      } else {
+        info.kind = p.kind();
+      }
       const size_t diff = sizeof(ProcDesc);
       owner->update_footprint(diff, this);
       process_proc_mem_aff_desc(p);

But somehow the implicit top level task is not visible in the profiler, which could be a bug in legion_prof.py. We need to ask @lightsighter what is the best way to fix the error.