cahirwpz/mimiker

Weird time jumps, probably due to icount being disabled in QEMU

Opened this issue · 4 comments

Excerpt from https://github.com/cahirwpz/mimiker/pull/1006/checks?check_run_id=1926763381:

| 8.938410 |   2 | kern/callout.c:101   | KL_CALLOUT | Add callout {0xffff00000010bb20} with wakeup at 8948.                       |
| 8.938421 |   2 | kern/callout.c:101   | KL_CALLOUT | Add callout {0xffff00000010bb50} with wakeup at 8949.                       |
| 8.938431 |   2 | kern/thread.c:179    | KL_THREAD  | Join 2 {0xffff0000000021d0} with 167 {0xffff000000002368}                   |
| 8.938455 |   2 | kern/sleepq.c:119    | KL_SLEEPQ  | Thread 2 goes to sleep on 0xffff000000002370 at pc=0xffff000000213cf4       |
| 8.938499 | 171 | kern/sleepq.c:119    | KL_SLEEPQ  | Thread 171 goes to sleep on 0xffff0000002301e4 at pc=0xffff00000023021c     |
     ^
     |
     v
| 8.963309 |   0 | kern/sleepq.c:157    | KL_SLEEPQ  | Wakeup thread 1 from 0xffff0000002a9090 at pc=0xffff000000228048            |
| 8.966100 |   1 | kern/sleepq.c:157    | KL_SLEEPQ  | Wakeup thread 167 from 0xffff0000002301e4 at pc=0xffff00000023021c          |
| 8.966120 |   1 | kern/sleepq.c:157    | KL_SLEEPQ  | Wakeup thread 168 from 0xffff0000002301e4 at pc=0xffff00000023021c          |
| 8.966134 |   1 | kern/sleepq.c:157    | KL_SLEEPQ  | Wakeup thread 169 from 0xffff0000002301e4 at pc=0xffff00000023021c          |

Excerpt from https://github.com/cahirwpz/mimiker/pull/1006/checks?check_run_id=1926763381:

| 4.025008 | 134 | kern/sleepq.c:119    | KL_SLEEPQ  | Thread 134 goes to sleep on 0xffff0000002301a4 at pc=0xffff0000002301dc     |
| 4.025024 | 135 | kern/sleepq.c:119    | KL_SLEEPQ  | Thread 135 goes to sleep on 0xffff0000002301a4 at pc=0xffff0000002301dc     |
| 4.025040 | 136 | kern/sleepq.c:119    | KL_SLEEPQ  | Thread 136 goes to sleep on 0xffff0000002301a4 at pc=0xffff0000002301dc     |
    ^
    |
    v
| 4.125152 |   0 | kern/sleepq.c:157    | KL_SLEEPQ  | Wakeup thread 1 from 0xffff0000002a9090 at pc=0xffff000000228018            |
| 4.134341 |   1 | kern/sleepq.c:157    | KL_SLEEPQ  | Wakeup thread 132 from 0xffff0000002301a4 at pc=0xffff0000002301dc          |
| 4.134365 |   1 | kern/sleepq.c:157    | KL_SLEEPQ  | Wakeup thread 133 from 0xffff0000002301a4 at pc=0xffff0000002301dc          |

In both cases the test callout_sync fails due to the time jump, which causes callouts that are supposed to run on different ticks to run at the same time.
The AArch64 timer is driven by the QEMU virtual clock, not the host clock, which makes the time jump all the more strange.

UPDATE: the same thing happens on MIPS: https://github.com/cahirwpz/mimiker/runs/1928695030

| 6.950163 | 102 | kern/sleepq.c:119    | KL_SLEEPQ  | Thread 102 goes to sleep on 0xc0162b18 at pc=0xc0162b68                     |
| 6.950205 | 103 | kern/sleepq.c:119    | KL_SLEEPQ  | Thread 103 goes to sleep on 0xc0162b18 at pc=0xc0162b68                     |
| 6.950244 | 104 | kern/sleepq.c:119    | KL_SLEEPQ  | Thread 104 goes to sleep on 0xc0162b18 at pc=0xc0162b68                     |
| 7.063905 |   0 | kern/sleepq.c:157    | KL_SLEEPQ  | Wakeup thread 1 from 0xc01fa420 at pc=0xc01548fc                            |
| 7.064159 |   1 | kern/sleepq.c:157    | KL_SLEEPQ  | Wakeup thread 100 from 0xc0162b18 at pc=0xc0162b68                          |
| 7.064252 |   1 | kern/sleepq.c:157    | KL_SLEEPQ  | Wakeup thread 101 from 0xc0162b18 at pc=0xc0162b68                          |

Another one caught on AArch64, this time not during the callout_sync test: https://github.com/cahirwpz/mimiker/pull/1005/checks?check_run_id=1928670781

| 39.504748 | 600 | aarch64/trap.c:54   | KL_UNDEF   | Last message repeated 1 times.                                              |
| 39.824406 | 600 | aarch64/trap.c:54   | KL_UNDEF   | 25 at $ffff0000002293f0, caused by reference to $410000!                    |

Also, the fact that the time in klog shows 40 seconds before timing out suggests that the timer is in fact driven by host time, since the test timeout on the host is 40 seconds.

Ok... without the icount option, the virtual clock is basically the host's monotonic clock.
I have traced how QEMU emulates the value of the cntpct_el0 register:

  1. Definition of the register: reading the value is accomplished using the gt_cnt_read() function.
  2. gt_cnt_read() returns gt_get_countervalue(), which returns the value of the virtual clock (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)) in nanoseconds divided by the counter period in nanoseconds
  3. qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) returns cpus_get_virtual_clock()
  4. cpus_accel->get_virtual_clock is not set, because icount is disabled (see here). Therefore, cpus_get_virtual_clock() returns cpu_get_clock(), which ultimately uses get_clock(), which calls clock_gettime(CLOCK_MONOTONIC).

I think the way to resolve this is to enable icount in QEMU. If I remember correctly, we initally had it enabled, but it was disabled because guest time was elapsing faster than host time.

The shift suboption of icount controls the number of guest nanoseconds that each guest instruction takes. If shift=N, each instruction executed in the guest advances the guest time by 2^N nanoseconds. If the shift value is too high, guest time will pass very fast. If it's too low, guest time will pass slowly. If shift=auto, QEMU will adjust the shift value at runtime so that guest time passes at more or less the same rate as host time. The drawback is that guest's perceived rate of instruction execution will vary.

Another solution is to have shift=N,align=on, with N large enough for guest time to elapse faster than host time. With these options, the guest's perceived instruction execution rate remains constant, and QEMU will periodically check the gap between guest and host time. If it's too large, QEMU will sleep so that host time catches up.

I don't think we rely on the instruction execution rate being constant, so shift=auto seems like a good enough solution to me.

Ok... shift=auto seems to cause test_signal_mask() to fail, probably because 3 calls to sched_yield() aren't enough to make sure the child process runs when shift is very high (the maximum value is 10, so one executed instruction advances guest time by 1024 ns). We can fix this by using a better method to wait for a signal to become pending, e.g. sigpending(), which is implemented in #1012.
Also, it might be a good idea to use a constant shift value for non-interactive tests, and use shift=auto for everything else.