Missing strategy for watching pseudo filesystems like those in /sys or /proc.

Question

Missing strategy for watching pseudo filesystems like those in /sys or /proc.

Closed this issue 2 years ago · 6 comments

notify has a general purpose PollWatcher that would otherwise be able to serve this role however it relies exclusively on modification timestamps to detect changes which doesn't work reliably depending on the details of the pseudo filesystem that's being watched. For example, /sys/class devices (such as /sys/class/net/*) update modification times of all "files" on any change and therefore would trigger a sweep of spurious changes frequently. Others don't update at all after boot so there would be no reasonable work-around to detect changes at all.

We can work around by offering alternate strategies for PollWatcher such as hashing the contents of directories/files to detect changes. This of course will be quite inefficient for large files but the assumption would be that developers wanting this feature are likely interested in very small transactions within /proc, /sys or other pseudo filesystems that offer small bits of useful insights into the kernel's behaviour. Given that the behaviour is opt-in, we can also offer some documentation to warn developers of the pit falls.

I propose we add a more sophisticated PollWatcher::new_with_config API that allows us to specify the change detection strategy and replace all the parts doing mtime comparison with a more abstract approach that delegates to the selected strategy to detect the change. I can work on a PR to make this happen (currently I just copied much of the code and modified it locally to suit my needs) but wanted to collect feedback first on this approach before I prepare it for a proper integration.

Thanks, keep up the great work folks!

Answer 1 · 2022-03-15T15:00:55.000Z

Hey, I like the idea, although if possible I'd add this to the Config struct.

That way we can stay backend agnostic and people can use the same config for multiple things (making it easier to re-integrate with a debouncer).

Answer 2 · 2022-03-15T15:02:01.000Z

We'll have to document the hashing approach well and can probably squeeze out a lot of performance or memory usage if we try - although it's fine to ignore for the start.

If you want to you could make a PR.

Answer 3 · 2022-03-15T18:40:13.000Z

There are definite pros and cons to making this a generic feature, and I think we should carefully consider them before promoting it to a first class feature. If it's OK with you I'd feel safer adding the support to poll.rs first then looking to expand it later after some more thoughtful discussion. Allow me to lay out my thoughts a little bit:

Pros for a generic Config feature for hashing:

Backend agnostic API that has consistent semantics on all platforms / uses
Ability to support a less efficient but more precise watch strategy such as the one used by Facebook's watchman (see https://facebook.github.io/watchman/docs/cmd/query.html, content.sha1hex)

Cons:

Unique work to implement for each watch strategy (i.e. costly to build and maintain)
Ambiguous implementation details: should we suppress Op::WRITE's which result in the same file contents but updated metadata? If we didn't, how could we inform the caller that the file was written but it's hash may or may not be the same?
Efficiency is hard to balance with API coherency. For example, do we automatically start rehashing on all Op::WRITE's or we do wait for Op::WRITE_CLOSE? If we wait for WRITE_CLOSE, do we suppress the WRITE's? If not, do we wait for the hashing after each WRITE before dispatching it? If we do not wait, how do we communicate we will get the hash async later?

To be clear, I'm not saying we shouldn't do this, I'm saying we need to be careful that this isn't undesirable feature creep which may be hard to maintain, to use, and to understand moving forward.

Answer 4 · 2022-03-15T18:50:26.000Z

Oh I didn't meant to implement this in every watcher, just add it to the generic config so you can use one config object to rule them all, build a fallback or such (passing to "recommended").

Answer 5 · 2022-03-15T18:55:14.000Z

Hmm, my concern with doing that is that the API becomes even more ambiguous if we start having backend-specific configurations in a generic object. I also can't think of a use case where we want something like pseudo filesystems to work in a generic way when we already know the generalizations don't work? This is kind of an odd use case really because it's actually a highly platform-specific feature masquerading in the platform-agnostic PollWatcher. It's just that adding an entirely new backend specifically for sysfs/procfs, etc feels at best a bit wasteful when the only change necessary is to just hash file contents instead of strictly relying on mtime.

Answer 6 · 2022-03-15T19:26:02.000Z

I don't think it's only sysfs/procfs - any network/fuse mount could behave like that (and I'd guess many do).
Edit: In fact windows file sync behaves like that when writing out big files - you see 0KB size when in-progress until you try to read it.

But I can see the point for making this poll-watcher specific.