Problems using cali2traceevent
Closed this issue · 2 comments
I have encountered two problems using cali2traceevent
(which is pretty nice tool btw).
- Every time I tried to run
cali2traceevent
, I get the error below:
Traceback (most recent call last):
File "cali2traceevent.py", line 369, in <module>
main()
File "cali2traceevent.py", line 344, in main
converter.read_and_sort(input)
File "cali2traceevent.py", line 151, in read_and_sort
self._process_record(rec[1])
File "cali2traceevent.py", line 202, in _process_record
self._process_event_end_rec(rec, (pid, tid), key, trec)
File "cali2traceevent.py", line 226, in _process_event_end_rec
btst = self.rstack[(loc,attr)].pop()
IndexError: pop from empty list
To fix this, I just check that rstackp(loc, attr)]
is not empty before using pop()
(see below). I don't know if it's the correct fix but it produces an output that looks correct for my application.
def _process_event_end_rec(self, rec, loc, key, trec):
attr = key[len("event.end#"):]
if self.rstack[(loc,attr)]:
btst = self.rstack[(loc,attr)].pop()
tst = _get_timestamp(rec)
self._get_stackframe(rec, trec)
trec.update(ph="X", name=rec[key], cat=attr, ts=btst, dur=(tst-btst))
- When I try to use
cali2traceevent
with a MPI code, all the processors show as process 0 and thread_id 0.caliper
does create multiple events file but I guess thatcali2traceevent
cannot deal with them together. Is that correct?
Hi @Rombur ,
The stack errors indicate to me that either a begin record is missing somewhere, or records from different process/thread locations got mixed up. Are you tracing a multi-threaded application?
As for MPI, cali2traceevent
can work with multiple MPI ranks. You'll just need to record them with the MPI information, which you can get e.g. with CALI_CONFIG=event-trace,trace.mpi
. This will record the MPI rank and MPI calls. You can then merge all the resulting .cali files together, e.g. with: cali2traceevent.py *.cali
. That way you should get a separate timeline for each MPI rank.
Timestamps from the MPI processes will likely be out-of-sync. However, you can a synchronization record in your trace by adding and calling the function below somewhere in the program and use the --sync
option for cali2traceevent, e.g. cali2traceevent.py --sync *.py
.
void caliper_ts_sync(MPI_Comm comm)
{
static int count = 0;
cali_id_t attr =
cali_create_attribute("ts.sync", CALI_TYPE_INT, CALI_ATTR_DEFAULT);
cali_variant_t val =
cali_make_variant_from_int(count++);
MPI_Barrier(comm);
cali_push_snapshot(CALI_SCOPE_PROCESS | CALI_SCOPE_THREAD, 1, &attr, &val);
}
Yes, it seems that the problem was from multithreading. Running the code in serial fixed it. Thanks.