Using caliper above threaded invocation
gshipman opened this issue · 13 comments
@jonahm-LANL and I discussed this this week. I'm looking for a way to mark caliper regions that are above threaded dispatch of tasks.
I'm currently directly instrumenting tasks within the benchmark, see: https://github.com/gshipman/parthenon/blob/d565f7810727b87c16a52979c0c0f7fee060c461/benchmarks/burgers/burgers_package.cpp#L55
If there is a place in the task dispatch machinery where I can add a caliper begin/end region and perhaps capture the name / symbol of the function that will be dispatched, or some other application meaningful metadata to properly name the region that might be one approach. Of course I would be measuring the invocation start/finish which may differ from execution start/finish, depending on the way the machinery works.
Any pointers? Thx.
The thought I had when we were discussing earlier in the week was you could instrument the functions that are added to the task list like here:
https://github.com/gshipman/parthenon/blob/d565f7810727b87c16a52979c0c0f7fee060c461/benchmarks/burgers/burgers_driver.cpp#L105
For exmaple, you could replace
auto update = tl.AddTask(avg_data, UpdateIndependentData<MeshData<Real>>, mc0.get(),
mdudt.get(), beta * dt, mc1.get());
with
auto update = tl.AddTask(avg_data,
[=](MeshData<Real> *a, MeshData<Real> *b, Real w, MeshData<Real> *c) {
CALI_CXX_MARK_FUNCTION;
PARTHENON_INSTRUMENT
UpdateIndependentData(a, b, w, c);
},
mc0.get(), mdudt.get(), beta * dt, mc1.get());
I'm not sure if what Calibper gets out of this will be useful... but you could also replace the anonymous function with a named one via a macro instead of a lambda, for example.
This would let you insert the instrumentation at the tasking level for any task you wanted to instrument.
Thx @Yurlungur I will give this a whirl. Since the task registration site is within the benchmark code I can use CALI_MARK_BEGIN/END as in:
auto update = tl.AddTask(avg_data,
[=](MeshData<Real> *a, MeshData<Real> *b, Real w, MeshData<Real> *c) {
CALI_MARK_BEGIN("UpdateIndependentData");
PARTHENON_INSTRUMENT
UpdateIndependentData(a, b, w, c);
CALI_MARK_END("UpdateIndependentData");
},
mc0.get(), mdudt.get(), beta * dt, mc1.get());
@Yurlungur Take a look here: https://github.com/gshipman/parthenon/blob/2f42a75dd1bae6a92b53ee93d0931f18a3846ad2/benchmarks/burgers/burgers_driver.cpp#L100
I'm getting a build error:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:21:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.hpp:20:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/parthenon/driver.hpp:18:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/application_input.hpp:22:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/bvals/boundary_conditions.hpp:22:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/meshblock_data.hpp:24:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/sparse_pack_base.hpp:30:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/state_descriptor.hpp:29:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/interface/metadata.hpp:30:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/prolong_restrict/prolong_restrict.hpp:31:
In file included from /usr/projects/eap/users/gshipman/parthenon-cali/src/bvals/comms/bvals_in_one.hpp:29:
/usr/projects/eap/users/gshipman/parthenon-cali/src/tasks/tasks.hpp:382:18: error: cannot initialize return object of type 'TaskStatus' with an rvalue of type 'void'
return func(std::forward<Args>(args)...);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/projects/eap/users/gshipman/parthenon-cali/src/tasks/tasks.hpp:217:7: note: in instantiation of function template specialization 'parthenon::TaskList::AddUserTask<(lambda at /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:101:6), parthenon::MeshData<double> *>' requested here
AddUserTask(dep, std::forward<Args>(args)...);
^
/usr/projects/eap/users/gshipman/parthenon-cali/src/tasks/tasks.hpp:207:12: note: in instantiation of function template specialization 'parthenon::TaskList::AddTask<(lambda at /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:101:6), parthenon::MeshData<double> *>' requested here
return AddTask(TaskQualifier::normal, dep, std::forward<Args>(args)...);
^
/usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:100:19: note: in instantiation of function template specialization 'parthenon::TaskList::AddTask<(lambda at /usr/projects/eap/users/gshipman/parthenon-cali/benchmarks/burgers/burgers_driver.cpp:101:6), parthenon::MeshData<double> *>' requested here
auto flx = tl.AddTask(none,
^
6 warnings and 1 error generated.
oops my bad. Try this:
auto flx = tl.AddTask(none,
[=](MeshData<Real> *a) {
CALI_MARK_BEGIN("CalculateFluxes");
auto status = burgers_package::CalculateFluxes(a);
CALI_MARK_END("CalculateFluxes");
return status;
}, mc0.get());
Tasks need to return a TaskStatus
enum. Forgot about that.
@Yurlungur , Looks like adding the CALI_MARK_BEGIN/END within AddTask is still problematic, I think that is running in a separate thread dispatched in TaskRegion::Execute().. So I moved CALI_MARK_BEGIN/END to here: https://github.com/gshipman/parthenon/blob/418c5d0b9a683ac88ff609f7246654af4aa6285c/src/tasks/tasks.hpp#L416
This allows a successful run. Is my observation correct (that the task is dispatched within tasks.hpp on a separate thread)?
If so, I need a way at getting at an identified for the task, ideally a string that can be generated during task initialization. I'd need a little guidance on how to do that.
Thanks!
@Yurlungur I've done a bit more work, I've added text names to tasks, and can successfully "caliper" around task enqueue. See:
https://github.com/gshipman/parthenon/blob/4be8412180e21ee1e68a60700ebd3cf3f18cc179/src/tasks/tasks.hpp#L440
But this isn't what we want, and it might not be possible to get what I want, I need to capture the task "start" and "end" from the main thread. I'm thinking this is going to be difficult without having the main thread polling a semaphore for start/stop or something. Thoughts?
@gshipman Ah, yes @jdolence recently rewrote the tasking infrastructure to use threads... however I think during normal operation there's still actually only one thread, as the rest of Parthenon isn't yet thread safe. I think tasks are all executed in serial on a given MPI rank---with the note that Kokkos loops may be non-blocking if you're on an accelerator. @jdolence should confirm.
I don't think you want to wrap around task enqueue
as that is just telling the tasking machinery to put the task into the list of tasks to do work on. It doesn't actually access the task. Or is that what you're trying to time? The tasking infrastructure vs the actual work of the task?
I guess I don't understand why timing within an individual task is problematic? At least if the code is run with threading disabled?
I guess I don't understand why timing within an individual task is problematic? At least if the code is run with threading disabled?
It's problematic if the code creates/destroys a new OS thread for each task. Caliper keeps some per-thread data around until program exit so it'll keep adding memory. However If the code runs without threading or with a fixed thread pool there shouldn't be a problem annotating at the task level.
I don't think you want to wrap around task
enqueue
as that is just telling the tasking machinery to put the task into the list of tasks to do work on. It doesn't actually access the task. Or is that what you're trying to time? The tasking infrastructure vs the actual work of the task?
@Yurlungur correct, I wouldn't want to capture enqueue, that was just to make sure I could get something if I was in the parent thread and include the task name in the caliper.
@jdolence, I'm running on Crossroads / Roci, should I just disable threading entirely? I'm not explicitly requesting threading, here is what cmake finds:
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
...
-- SERIAL backend is being turned on to ensure there is at least one Host space. To change this, you must enable another host execution space and configure with -DKokkos_ENABLE_SERIAL=OFF or change CMakeCache.txt
-- Using -std=c++17 for C++17 standard as feature
-- Built-in Execution Spaces:
-- Device Parallel: NoTypeDefined
-- Host Parallel: NoTypeDefined
-- Host Serial: SERIAL
--
-- Architectures:
-- Found TPLLIBDL: /usr/include
-- Using internal desul_atomics copy
-- Kokkos Devices: SERIAL, Kokkos Backends: SERIAL
-- Using Kokkos source from Parthenon submodule at /usr/projects/eap/users/gshipman/parthenon-cali/external/Kokkos