softdevteam/krun

Division by zero crash on startup experiment

vext01 opened this issue · 3 comments

[2017-09-06 12:31:18: DEBUG] Fatal Krun error: float division by zero
  File "./krun/krun.py", line 231, in main
    inner_main(mailer, on_first_invocation, config, args)
  File "./krun/krun.py", line 357, in inner_main
    sched.run()
  File "/home/kruninit/warmup_experiment/krun/krun/scheduler.py", line 524, in run
    measurements, instr_data, flag = job.run(self.mailer, self.dry_run)
  File "/home/kruninit/warmup_experiment/krun/krun/scheduler.py", line 373, in run
    stdout, stderr, rc, self.sched.config)
  File "/home/kruninit/warmup_experiment/krun/krun/util.py", line 293, in check_and_parse_execution_results
    config.AMPERF_RATIO_BOUNDS)
  File "/home/kruninit/warmup_experiment/krun/krun/amperf.py", line 70, in check_amperf_ratios
    busy_threshold, ratio_bounds)
  File "/home/kruninit/warmup_experiment/krun/krun/amperf.py", line 90, in check_core_amperf_ratios
    ratio = norm_aval / norm_mval

Traceback (most recent call last):
  File "./krun/krun.py", line 395, in <module>
    main(parser)
  File "./krun/krun.py", line 240, in main
    raise exn
ZeroDivisionError: float division by zero

Probably because the iteration is so short...

Yes

(Pdb) list
 87             # normalise the counts to per-second readings
 88             norm_aval = float(aval) / wctval
 89             norm_mval = float(mval) / wctval
 90             if norm_mval == 0.0:
 91                 import pdb; pdb.set_trace()
 92  ->         ratio = norm_aval / norm_mval
 93             ratios.append(ratio)
 94  
 95             if norm_aval > busy_threshold:
 96                 # Busy core
 97                 busy_iters.append(True)
(Pdb) aval
0
(Pdb) mval
0
(Pdb) wctval
13868.715989

I suppose the correct fix would be a NO_AMPERF_CHECK config, but i'm keen to find a workaround for the 1.2 data, as it already uses an older version of Krun...

Thoughts?

The fix we've agreed upon is not to check the ratios in startup.krun.

Fixed.