google/kmsan

Uninitialized bitfields create long origin chaings

ramosian-glider opened this issue · 5 comments

Starting from Linux v.4.20-rc2 there're seven 1-bit fields in struct mmu_gather packed into an unsigned int value. One of those fields is updated in a tight loop. On each update, the remaining 25 uninitialized bits receive a new origin generated from the previous one:

u32 values;
for (i = 0; ... ; i++) {
  values |= FIELD_MASK;
  // u32 *origin = get_origin_ptr(&values);
  // *origin = __msan_chain_origin(*origin);
}

This generates a lot of unnecessary origin chains, which may quickly fill the stack depot and start reporting warnings.

In user-space we limit length of origin chains to some small number (e.g. 5). Perhaps we need the same in kernel.

Also, if it's updated in a loop, we could ignore addition of the same stack to the chain. I.e. if the old chain is (A+B) and we add B again, then we return the current (A+B) stack id. Does it make any sense? Or we already do this?

In kernel stackdepot mapping from id to stack is super fast:
https://github.com/torvalds/linux/blob/master/lib/stackdepot.c#L197-L207
Should do the same in userspace.

We do have a limit on chaining (7 stacks), and actually in this particular case the problem doesn't lead to stack depot explosion, because the stack traces are always the same (they start from the exit() syscall).
There's a warning in KMSAN that's printed on every 10000 dropped chains, but the case in question is on a hot path, and it renders this warning useless (and also leads to tons of unnecessary accesses to the stack depot).

This shouldn't be a problem anymore.
When an origin chain hits the maximum length, it stops growing, and __msan_chain_origin() will continue assigning the same origin id to every origin created from that chain.
There will be cases when this happens too often - I am planning to introduce a tracepoint to detect such cases, but they won't be hurting us noticeably anyway.