DeanoBurrito/northport

Kernel crashes with SMP enabled at high core counts

DeanoBurrito opened this issue · 1 comments

This looks like a heisenbug. So far I've been able to reproduce it on my desktop (16 cores), my laptop (also 16 cores) seems to be fine - but this may be timing related as the mobile processor is a lot slower. The kernel does work on the same machines when only using a single core, increasing the number of used cores seems to increase the speed at which the crash happens.
This can happen in KVM too, although it's a lot rarer, and adding print statements to the scheduling logic seems to the outright remove the issue.

For reference there's only a handful of places multiple cores interact with each other:

  • The clock: when an AP adds a new clock event it accesses the global event list.
  • The scheduler: each core operates mostly independently, but [en/de]queuing threads can happen across cores, and work stealing will access (atomically, so unlikely to be the issue) another core's work queue. Idle thread's are unique to each core, so shouldn't be a problem.
  • IPI mailboxes: any core can access any core's mailbox (including it's own).

Fixed in 6b1886d.