kata-containers/agent

Race between getDeviceName() and uevent arrival

dgibson opened this issue · 0 comments

Get your issue reviewed faster

I spotted this bug while looking at the code, I haven't reproduced it.

Description of problem

getDeviceName() takes some form of device address (generally a piece of the device's sysfs path) and returns the path to the associated device node in the VM. It gets this information from kernel uevents which are watched by listenToUdevEvents().

Because hotplugs and uevents are happening asynchronously, it needs to handle both the case that the relevant uevent has already occurred and the case where it needs to wait for the event to happen. To do this the (poorly named) pciDeviceMap in the sandbox structure contains a mapping from sysfs paths to device paths which is filled in by listenToUdevEvents() regardless of whether we're watching for a specific device or not.

Both the pciDeviceMap and deviceWatchers are protected by the same mutex. getDeviceName() takes the mutex, first checks pciDeviceMap and if the device isn't there yet records its interest in deviceWatchers. listenToUdevEvents() (also under the mutex) will signal the channel stored in deviceWatchers when the relevant event occurs.

Expected result

getDeviceName() should behave identically whether the relevant uevent has already occurred, or occurs after calling it.

Actual result

In fact, the matching conditions for the device address to sysfs path are currently different between getDeviceName() (a simple string.Contains()) and listenToUdevEvents() (a bunch of different conditions for different device types).

This means that getDeviceName() could match a device and produce a result if the uevent has already happened, but will fail to get a match if it has to wait (the reverse isn't possible in practice, since the conditions in listenToUdevEvents() are strictly more restrictive than that in getDeviceName()).

Further information

Being really specific about what we're matching sounds like a good idea in theory. However the current conditions in listenToUdevEvents() do a poor job of this. They're hard to understand, highly coupled to distant parts of the code, and still not really specific, since they just matches any of a bunch of criteria rather than being determined by the specific watcher entry.

Just using strings.Contains() is pretty crude, but if it's good enough in getDeviceName() it should be good enough in listenToUdevEvents() too.