facebook/watchman

Time Based Queries and Symlink Following

wez opened this issue · 0 comments

wez commented

Watchman does not currently "support symlinks".
#105 (comment) has some thoughts on how we might address this.

This particular issue encompasses one portion of this effort:

How do we handle notifications about symlinks?

There are a couple of cases:

  1. Inside /root we have /root/A ->/root/B. WhenBis changed, we'd like to indicate thatA` has changed. Both A and B are in the same watched root.
  2. /rootA/A -> /rootB/B Again, when B is changed, we'd like to indicate that A has changed. A and B are in different roots.
  3. /rootA/A -> /rootB/B and /rootC/C -> /rootB/B. When B is changed, we'd like to indicate that A and C have changed. There are three different roots.

There can be an arbitrary number of things that point to the B node. We need an efficient way to track this graph so that we can deduce the set of logical paths that may appear to have changed.

Simplistically, whenever we resolve a symlink (see #347 for roughly the right place to hook this in) we want to make a note of the reverse flow, so when we track A -> B we need to record against B that it has an alias at /rootA/A. It can have multiple aliases.

Using this graph in queries

Whenever we process a query that has follow_symlinks enabled, in addition to matching all of the locally matching file nodes as usual, we also need to do something like this psuedo code:

foreach (recently_changed_files as file) {
   foreach (file->aliases as alias) {
      evaluate_subscription_criteria(alias);
   }
}

it's more complicated though; if in the example above /root/B is a dir and we notice that /root/B/file changed, file won't have any direct aliases. Instead, we have to walk up to the parent and expand the set of aliases with those of its parent for each level up to the root of that watch. Each level generates a candidate path for the alias. For example (3) above, we'll emit aliases /rootA/A/file and /rootC/C/file

For the path walking query generators (path and glob) these aliases are used as a source of additional file nodes for the walk. For most queries though, the resolved file will be passed through the expression evaluator.

Using this graph in subscriptions

For subscriptions to work, after each change in a root, we'd need to expand the possible set of changed files using whatever we settle on for the above, and then synthesize changes against the appropriate set of roots.

The trick here is that these synthesized results are never valid to track in the data structures that watchman maintains for a root (it takes great pains to avoid recording information about symlink targets), and we need to propagate them to the subscription processing phase.

So we'd need to build up a list of synthetic file nodes and then wake up the IO thread such that it notices and dispatches this list.

This is all pretty expensive

The alias list expansion potentially adds depth-of-tree and number-of-aliases as factors to the time complexity for processing changes. We'll want to find a way to tackle this problem that can do better than this.

Can we use our radix tree data structure to make this less of a problem? We can answer questions like "does path FOO have any aliases" by doing a prefix match against a radix tree to make things cheaper for the non-aliased case.