watchexec/watchexec

watchexec takes 8s to start, does 2M syscalls across 148 threads

kentonv opened this issue · 16 comments

I'm using watchexec 1.20.5 on Linux. It works great! Except for one thing...

I have a directory structure like this:

src/   # my code
deps/  # literally like 200,000 files

When I run watchexec I do it like this:

watchexec --watch src make

watchexec spends about 8 seconds scanning through the entirety of deps. The -vvvv log, which is 528 megabytes long, suggests it is looking for "ignore files". The various --no-*-ignore options don't seem to change anything.

During this time, according to strace, watchexec apparently creates 148 threads which perform 2,458,067 system calls, of which 2,337,801 are futex operations. It also stat()s 44,270 files and opens 14,634 of them. After all that, it sets up 26 inotify watches.

Could watchexec be made to scan only the directory I asked it to watch?

That's Known Issue Number 2! You can work around for now with --project-origin src, or by starting watchexec in src and setting the cwd of the command with --workdir ...

Oops, sorry I missed that! Thanks for the tip, --project-origin does seem to solve the problem.

The excessive context switching via futex operations might be worth looking into as an optimization opportunity -- it seems to me that even without behavioral changes, some time could probably be saved there. But feel free to close this if you prefer.

That's true, yes, we could likely get much better performance by optimising where file I/O happens. I'll have a specific look at this.

@passcod is .gitignore parsing currently a depth-first search, when it should be breadth-first starting at the root for project origin detection, and then recursively applying ignore files in descent?

Ignore files only apply to their subtrees, yet after installing an NPM package, I see traces like the following:

  2022-09-05T20:14:42.704608Z TRACE watchexec::ignore::files: found a file, path: "/home/friel/c/github.com/pulumi/pulumi/.gitignore"
  2022-09-05T20:14:42.704642Z TRACE watchexec::ignore::filter: reading ignore file, file: IgnoreFile { path: "/home/friel/c/github.com/pulumi/pulumi/.gitignore", applies_in: Some("/home/friel/c/github.com/pulumi/pulumi"), applies_to: Some(Git) }
  2022-09-05T20:14:42.704761Z TRACE watchexec::ignore::filter: adding ignore line, line: "**/node_modules/"
...
    in watchexec::ignore::files::visit_path with path: "/home/friel/c/github.com/pulumi/pulumi/sdk/nodejs/node_modules/tsconfig-paths/src/__tests__"
    in watchexec::ignore::filter::check_dir with path: "/home/friel/c/github.com/pulumi/pulumi/sdk/nodejs/node_modules/tsconfig-paths/src/__tests__/data"
    in watchexec::ignore::files::dir_entry with path: "/home/friel/c/github.com/pulumi/pulumi/sdk/nodejs/node_modules/tsconfig-paths/src/__tests__/data"
    in watchexec::ignore::files::visit_path with path: "/home/friel/c/github.com/pulumi/pulumi/sdk/nodejs/node_modules/tsconfig-paths/src/__tests__"

If ignore files follow the man gitignore spec, then there's no point in crawling any folder that matches a previously-discovered filter.

An optional prefix "!" which negates the pattern; any matching file excluded by a previous pattern will become included again. It is not possible to re-include a file if a parent directory of that file is excluded. Git doesn’t list excluded directories for performance reasons, so any patterns on contained files have no effect, no matter where they are defined.

This seems like it would be possible for me to make a contribution here. The product I work on, Pulumi, recently shipped a a change to embed watchexec to support pulumi watch functionality. See: https://github.com/pulumi/watchutil-rs

Yes, that's absolutely the case (well, kind of, it's complicated, there's actually two layers of figuring out ignore files and that's where the issues arise; happy to chat if you want to know more). I'd love a contribution! The work would be in the ignore-files crate and filterer, which get used in the relevant places in watchexec.

I have a similar issue with fd -tf . 'src/' | wc -l returning 75.
However, jabbascript (angular in my case) has a giantic filesize.

I did use watchexec -w src/ 'ng build && stuff_to_install'.

Aside from any other possible optimisations, shouldn't it be fairly trivial to make watchexec skip looking for ignore files if all of --no-default-ignore --no-global-ignore --no-project-ignore --no-vcs-ignore are set?

Yes. That would be a conditional around here, if you want to PR this: https://github.com/watchexec/watchexec/blob/main/crates/cli/src/filterer/globset.rs#L23

I also ran into this (very significant) problem

watchexec takes 8s to start, but when using --project-origin $PWD, it's down to 1s

It seems "scanning and watching the entire filesystem for files that could slightly modify the behavior of the program" is more anti-feature than feature for a small utility like this.

While not fully fixed, 1.23 is more clever around this, and you can use --no-discover-ignore to disable looking for ignore files altogether.

While a more thorough fix would be great, --no-discover-ignore works well for us in the meantime as a workaround

Hey @kentonv a fix for this issue made it into watchexec v1.25.0+, would you mind trying the newer version and see if your problem is fixed?

I've measured a bit myself and it's not really fixed, though your improvements did help.

However, one larger issue than the ignores, which I hadn't twigged on until I started measuring in earnest, is the notify watches, which are recursed by notify and not by us.

For example looking at watchexec with RUST_LOG=notify=trace on Linux shows notify adding watches for the entire .git folder recursively, among other things; fixing that involves writing the watch recursion into watchexec and using non-recursive watches at the notify boundary; I've written up surface notes on the complexities therein: https://cohost.org/watchexec/post/4070897-early-feature-concep

It turns out that this particular issue here is an onion of causes adding up to a large negative effect, rather than a single buggy layer to be fixed!

Ahh thanks for checking this -- sad that this issue isn't fixed completely, but thanks for linking to the explanation/your thoughts on it!

I'm also running into this issue. Using --no-discover-ignore helps, but --project-origin $PWD does not.