fabiospampinato/watcher

Unexpected behavior and some other issues

aleksey-hoffman opened this issue · 10 comments

Thanks for creating a "chokidar" alternative. I'm considering moving to "watcher" at some point in the future because it handles drive root dirs like C:/ properly (doesn't emit unlink events on init), but I encountered some problems with it, as well as some unexpected behavior, while testing it:

  1. If you set, let's say, depth: 3 and recursive: true it still detects changes at deeper levels: 4, 5, 6, 7, ... and so on, which doesn't make much sense since depth is supposed to represent the scan depth limit.
  2. If you set depth option but do not specify recursive: true it won't detect any changes in the nested directories at depth > 0, which doesn't make sense either. I think it should automatically set recursive: true if you specify depth > 0. Am I misunderstanding the purpose of this property? What's the point of setting depth: 10 if it does not detect any changes at depth > 0 until you set recursive: true?
  3. If you call the function that inits the watcher multiple times in a row for different paths with a relatively high depth level, like 6, so it doesn't have a chance to finish the previous task, it doesn't close the previous watchers, it just keeps scanning the directory specified in the first function call, until the scan is finished. I think it's happening because it scans the drive at a max possible speed, and to no surprise, such a computationally intensive task blocks the thread completely, as if it was a synchronous function. I think it can be solved with a timeout function that pauses the scan every 1 ms or so, or by setting a lower highWaterMark value for the node's read stream that scans the drive, not sure.
  4. Sometimes, it starts throwing the Uncaught TypeError shown below during the drive scan (before it emits ready event). It happens 100% of the time for some paths like C:/ (system drive) past certain depth (for me it's 4) and only ~25% of the times for other paths. I just reload the app, and after a few seconds manually trigger the watcher function for some path with a button. Sometimes I get no errors and it logs ready, sometimes I get the error once before it logs ready, sometimes it throws the error multiple times:
Uncaught TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received null
    at validateString (internal/validators.js:117)
    at Object.resolve (path.js:139)
    at WatcherHandler.onWatcherChange (watcher_handler.js?a049:197)
    at FSWatcher.emit (events.js:310)
    at FSEvent.FSWatcher._handle.onchange (internal/fs/watchers.js:135)
NodeError @ internal/errors.js:254
validateString @ internal/validators.js:117
resolve @ path.js:139
onWatcherChange @ watcher_handler.js?a049:197
emit @ events.js:310
FSWatcher._handle.onchange @ internal/fs/watchers.js:135
01:56:28.412 App

I'm not sure how to catch that error. I tried adding watcher.on ( 'error', error => {}) and wrapping the whole function with a try / catch block. Neither catches the error.

Code

I init this watcher inside of a web worker in an Electron app.

dirWatcherWorker.js

import Watcher from 'watcher'
let watcher = null
let dirPath = 'C:/'

self.addEventListener('message', () => {
  initDirWatch()
})

function initDirWatch () {
  // Reset watcher
  try { watcher.close() } catch (error) {}

  // Init watcher
  watcher = new Watcher(dirPath, { 
    ignoreInitial: true, 
    depth: 5,
    recursive: true
  })

  // Init listeners
  watcher.on('all', (event, targetPath, targetPathNext) => {
    console.log('Watcher:', event, targetPath, targetPathNext)
  })
  watcher.on('error', (error) => {
    console.log('Watcher:', error)
  })
}

Environment

OS: x64, Windows 10 20H2 build 19042.662
Exec env: web worker in an Electron app v10.1.3 (Chrome 85 + Node v12.16)

If you set, let's say, depth: 3 and recursive: true it still detects changes at deeper levels: 4, 5, 6, 7, ... and so on, which doesn't make much sense since depth is supposed to represent the scan depth limit.

There's actually a comment about this in the codebase:

depth?: number, //FIXME: Not respected when events are detected and native recursion is available, but setting a maximum depth is mostly relevant for the non-native implemention

The thing is when native recursion is used (macOS and Windows) setting a depth limit doesn't actually cause directories at greater depth than that to not be watched, Node doesn't expose any API for controlling that.

So should the library just discard some events? At the end of the day setting a depth is just useful from a performance perspective I think.

If you set depth option but do not specify recursive: true it won't detect any changes in the nested directories at depth > 0, which doesn't make sense either. I think it should automatically set recursive: true if you specify depth > 0. Am I misunderstanding the purpose of this property? What's the point of setting depth: 10 if it does not detect any changes at depth > 0 until you set recursive: true?

The depth property sets the maximum depth watched, not the minimum depth watched, so basically when recursion is disabled it's useless. Maybe the library should emit an error when depth is used with recursion disabled?

If you call the function that inits the watcher multiple times in a row for different paths with a relatively high depth level, like 6, so it doesn't have a chance to finish the previous task, it doesn't close the previous watchers, it just keeps scanning the directory specified in the first function call, until the scan is finished. I think it's happening because it scans the drive at a max possible speed, and to no surprise, such a computationally intensive task blocks the thread completely, as if it was a synchronous function. I think it can be solved with a timeout function that pauses the scan every 1 ms or so, or by setting a lower highWaterMark value for the node's read stream that scans the drive, not sure.

Could you post some code about this, so that I can better understand the scenario?

Sometimes, it starts throwing the Uncaught TypeError shown below during the drive scan (before it emits ready event). It happens 100% of the time for some paths like C:/ (system drive) past certain depth (for me it's 4) and only ~25% of the times for other paths. I just reload the app, and after a few seconds manually trigger the watcher function for some path with a button. Sometimes I get no errors and it logs ready, sometimes I get the error once before it logs ready, sometimes it throws the error multiple times:

I'll look into that, it sounds like a bug.

try { watcher.close() } catch (error) {}

I don't think closing a watcher should throw an error, if it does the library should be handling it, I'll double check that it handles potential errors.

So should the library just discard some events? At the end of the day setting a depth is just useful from a performance perspective I think.

Hmm, I'm not sure actually. Are you saying that on Windows and Mac, this library always uses native watcher when recursive: true and therefore it sets depth: infinity and there's no way to reduce the CPU / drive usage by specifying a lower depth? In that case, I guess there's no point discarding the events.

By the way, doesn't Linux have a built-in file system watcher as well? I thought I've read about it somewhere.

The depth property sets the maximum depth watched, not the minimum depth watched, so basically when recursion is disabled it's useless. Maybe the library should emit an error when depth is used with recursion disabled?

Yeah, that's what I'm saying as well. That's why I don't see the point of having 2 options for controlling the same thing. When you set depth: 0 the library should scan only the specified directory (there's no recursion needed), and when you set depth: >0 it should use recursion automatically. I don't understand the purpose of recursive option.

Could you post some code about this, so that I can better understand the scenario?

As you probably know, async functions can block (freeze) the thread when it computes a computationally intensive task. In this case, watcher blocks the thread until it's done scanning the drive, which means you cannot cancel the task by calling watcher.close(), since it won't be executed until the thread is un-frozen.

import Watcher from 'watcher'
let watcher = null

function initDirWatch (dirPath) {
  console.log('Watcher init:', dirPath)

  // Reset watcher
  try { watcher.close() } catch (error) {}

  // Init watcher
  watcher = new Watcher(dirPath, { 
    ignoreInitial: true, 
    depth: 6,
    recursive: true
  })
  watcher.on('all', (event, targetPath, targetPathNext) => {
    console.log('Watcher:', event, targetPath, targetPathNext)
  })
}

// Init watcher for different paths
initDirWatch('C:/')
setTimeout(() => { initDirWatch('D:/') }, 100)
setTimeout(() => { initDirWatch('C:/Windows') }, 200)
setTimeout(() => { initDirWatch('D:/') }, 300)
setTimeout(() => { initDirWatch('C:/') }, 400)

The idea is simple, imagine this watcher is used in a file manager app. The user navigates through directories quickly, causing this function to run 5 times for different dirs in like 2 seconds. The first function would freeze the thread for like ~60 seconds until it's finished scanning the drive.

This is why I proposed to implement a pause that would un-freeze the event loop for a few ms and check if there was a cancel event. Even if it would pause every 100ms and check if it was canceled, it would still be better than waiting for 60 seconds to finish.

Expected behavior: each function call cancels the previous task by calling watcher.close() at the beginning of the function (as shown in the code example) and you should see the log that looks something like this:

>00:00:000 Watcher init: 'C:/
>00:00:010 Watcher init: 'D:/
>00:00:020 Watcher init: 'C:/Windows
>00:00:030 Watcher init: 'D:/
>00:00:040 Watcher init: 'C:/

Current behavior: the previous task cannot be canceled because the first function call blocked the thread. The thread remains blocked until it finishes the first drive scan which becomes irrelevant by the time it's done, because the path has been changed:

>00:00:000 Watcher init: 'C:/
>00:60:000 Watcher init: 'D:/   <=== takes 60 seconds to be called because the first call blocked the thread
>00:60:010 Watcher init: 'C:/Windows
>00:60:020 Watcher init: 'D:/
>00:60:030 Watcher init: 'C:/

I don't think closing a watcher should throw an error, if it does the library should be handling it, I'll double check that it handles potential errors.

It throws an error when you are calling the function for the first time because it is initialized as let watcher = null. It says it cannot call close() on null

Thanks for looking into this 👍
I'm building an advanced modern file manager. Chokidar has 3 bugs related to drive root directories, which is why I'm considering moving to this module.

Hmm, I'm not sure actually. Are you saying that on Windows and Mac, this library always uses native watcher when recursive: true and therefore it sets depth: infinity and there's no way to reduce the CPU / drive usage by specifying a lower depth?

Yes exactly. We could switch to manual watching logic under all platforms, but using the native implementations should have much better performance.

By the way, doesn't Linux have a built-in file system watcher as well? I thought I've read about it somewhere.

Apparently not a recursive one nodejs/node#36005 (comment)

I don't understand the purpose of recursive option.

It would be cleaner to just have one option, but the current options align more with chokidar which people are already familiar with and keeping them aligned makes it easier for people to switch libraries.

As you probably know, async functions can block (freeze) the thread when it computes a computationally intensive task. In this case, watcher blocks the thread until it's done scanning the drive, which means you cannot cancel the task by calling watcher.close(), since it won't be executed until the thread is un-frozen.

Scanning the directory is done asynchronously precisely in order not to block the thread, I'm not sure what's happening here, I'll have to take a closer look at this.

I'm building an advanced modern file manager. Chokidar has 3 bugs related to drive root directories, which is why I'm considering moving to this module.

It sounds like an interesting use case, especially from my point of view because whatever bugs this library has you are probably going to hit them eventually with a use case like that, so they can be fixed.

I haven't had the time to work on this yet, but I've been keeping a tab open for it in my browser, I'm relying on this library too so definitely whatever bugs are found I'm committed to fix them. In the meantime if you have the time to look more closely at these issues PRs are very welcome.

@fabiospampinato

Apparently not a recursive one nodejs/node#36005 (comment)

Perhaps it shouldn't use Node modules for that at all, but instead exec a system command via Node's child_process?
There are some articles on built-in linux system watchers, e,g:
https://stackoverflow.com/questions/8699293/how-to-monitor-a-complete-directory-tree-for-changes-in-linux

Scanning the directory is done asynchronously precisely in order not to block the thread, I'm not sure what's happening here,

Yep it's async, but I think it uses 100% of the thread and since it runs on the same thread as the main process, the main process cannot do anything else during the scan. It's a known issue with async functions - if they are super computationally intensive, they will still block the thread they run on. That's why I proposed pausing the read stream every 1ms or so, or specifying highWaterMark option (it throttles read stream to a certain size, default is 16 kb I think). but this option can hurt performance. I'm not sure if there are better ways to handle this. Personally in my app, I'm just running the scan in a web worker or in a separate Node process. I'm not even sure you should fix this, perhaps that's the job of the developer to make sure the library is not used on the main thread?

Thanks.
Unfortunately I also don't have any time for this at the moment. I'm gonna have to use some workarounds for now.

An update on this:

  • I've added a new option, "native", which when set to "false" allows you to disable the native recursive watcher. For your use case it might be best to disable it, opting-in into the manual recursive watcher, which will respect the "depth" option you set. So basically setting native: false and depth: [low-number] should work much better for you. This should address point number 1.
  • Regarding the error that gets thrown some times I've kind of blindly made the function that causes the error to be thrown more resilient to it, so you shouldn't see it anymore. I'm not sure if there's a deeper problem here though, according to Node's docs I don't think "null" should get passed to the event handler for file system events. This should address point number 4.
  • Regarding issue number 3 I think the remaining root issue is that this function call here is not interruptible, so until it finishes it doesn't matter that you closed the watcher, and trying to scan the entire drive is going to take a while.
    • If you opt-out of the native watcher this shouldn't be that much of problem for you anyway if you use a low depth number.
    • If you want to use the native recursive watcher on very deep directories that may also cause the thread to freeze for a while, that I can't do anything about, maybe Node's maintainers can't either as the problem may reside in the actual native watchers.
    • If you use the manual recursive watcher but set a high depth number then I think first of all the disk scan should be less aggressive, yielding to the event loop more frequently, and secondly it should be interruptible. This requires some things to be changed internally, if the other issues you experienced are fixed in v1.1.0 I'd say let's move this to a new standalone github issue.

Perhaps it shouldn't use Node modules for that at all, but instead exec a system command via Node's child_process?

Unless a native recursive watcher gets implemented or used for Linux by Node itself I don't think this library is going to do anything about it, I don't want to deal with other native binaries at all.

@fabiospampinato thanks for working on this and for the explanations. I'll try it again with the new options.

Though, why is the readdir function un-interruptible? Couldn't it use setInterval() and check whether some state.isCanceled is true every 100ms or so? When the property value becomes true, it would stop the recursive scan for the current directory, e.g.:

this.state = {
  isCanceled: false
}

function tinyReaddirRecursiveInterruptableScan (path) {
  if (this.state.isCanceled) { 
    return 
  }
  ...
}

Though, why is the readdir function un-interruptible?

That just was the previous state of things, I didn't mean to say that it's impossible to scan the drive in chunks and interrupting the whole scan at pretty much any point, just that being able to do that required some changes to be made.

I just finished making those changes, as of v1.1.1 now:

  • Directory scanning is interrupted automatically as soon as the watcher is closed, so no extra overhead there.
  • Directory scanning is performed in a way that shouldn't cause too much problems to the event loop, at maximum 500 shallow directories are scanned at once now. In my system (~5yo MBP) basically the library should perform work in ~2ms chunks, everything should remain pretty smooth at any point.

Everything reported in this issue should be addressed now, let me know if you can find any other issues and if in fact those issues actually got addressed for you.

@fabiospampinato great job 👍
I will try to switch from Chokidar to Watcher v1.1.1 in my file manager app and report if something's wrong.

Feel free to open another issue if you can find other issues.