LLNL/Caliper

Problems using cali2traceevent

Closed this issue · 2 comments

I have encountered two problems using cali2traceevent (which is pretty nice tool btw).

  • Every time I tried to run cali2traceevent, I get the error below:
Traceback (most recent call last):
  File "cali2traceevent.py", line 369, in <module>
    main()
  File "cali2traceevent.py", line 344, in main
    converter.read_and_sort(input)
  File "cali2traceevent.py", line 151, in read_and_sort
    self._process_record(rec[1])
  File "cali2traceevent.py", line 202, in _process_record
    self._process_event_end_rec(rec, (pid, tid), key, trec)
  File "cali2traceevent.py", line 226, in _process_event_end_rec
    btst = self.rstack[(loc,attr)].pop()
IndexError: pop from empty list

To fix this, I just check that rstackp(loc, attr)] is not empty before using pop() (see below). I don't know if it's the correct fix but it produces an output that looks correct for my application.

     def _process_event_end_rec(self, rec, loc, key, trec):
        attr = key[len("event.end#"):]
        if self.rstack[(loc,attr)]:
            btst = self.rstack[(loc,attr)].pop()
            tst  = _get_timestamp(rec)

            self._get_stackframe(rec, trec)

            trec.update(ph="X", name=rec[key], cat=attr, ts=btst, dur=(tst-btst))
  • When I try to use cali2traceevent with a MPI code, all the processors show as process 0 and thread_id 0. caliper does create multiple events file but I guess that cali2traceevent cannot deal with them together. Is that correct?

Hi @Rombur ,

The stack errors indicate to me that either a begin record is missing somewhere, or records from different process/thread locations got mixed up. Are you tracing a multi-threaded application?

As for MPI, cali2traceevent can work with multiple MPI ranks. You'll just need to record them with the MPI information, which you can get e.g. with CALI_CONFIG=event-trace,trace.mpi. This will record the MPI rank and MPI calls. You can then merge all the resulting .cali files together, e.g. with: cali2traceevent.py *.cali. That way you should get a separate timeline for each MPI rank.

Timestamps from the MPI processes will likely be out-of-sync. However, you can a synchronization record in your trace by adding and calling the function below somewhere in the program and use the --sync option for cali2traceevent, e.g. cali2traceevent.py --sync *.py.

void caliper_ts_sync(MPI_Comm comm)
{
    static int count = 0;

    cali_id_t attr = 
        cali_create_attribute("ts.sync", CALI_TYPE_INT, CALI_ATTR_DEFAULT);
    cali_variant_t val = 
        cali_make_variant_from_int(count++);

    MPI_Barrier(comm);
    cali_push_snapshot(CALI_SCOPE_PROCESS | CALI_SCOPE_THREAD, 1, &attr, &val);
}

Yes, it seems that the problem was from multithreading. Running the code in serial fixed it. Thanks.