gforceg/node-hound

Cannot watch large directories

ericvicenti opened this issue · 5 comments

Directories with too many files result in this exception for me on OSX with node v0.8.8. In my case, "too many" files is 162 files clocking in at 22.7MB on disk. I'm not sure where the breaking threshold is, but I would hope we can watch directories of this size or greater.

var a = hound.watch('/Users/me/Projects/project')

Error: watch EMFILE
at errnoException (fs.js:806:11)
at FSWatcher.start (fs.js:837:11)
at Object.fs.watch (fs.js:861:11)
at Hound.watch (/XYZ/node_modules/hound/hound.js:45:27)
at Hound.watch (/XYZ/node_modules/hound/hound.js:42:12)
at Hound.watch (/XYZ/node_modules/hound/hound.js:42:12)
at Hound.watch (/XYZ/node_modules/hound/hound.js:42:12)
at Hound.watch (/XYZ/node_modules/hound/hound.js:42:12)
at Hound.watch (/XYZ/node_modules/hound/hound.js:42:12)
at Object.exports.watch (/XYZ/node_modules/hound/hound.js:13:11)

I have gotten around this in the short term by setting up multiple hound watchers for each sub directory within the project.

For comparison, and it might not be an applicable one because of hound's feature-set, nodemon successfully watches directories of this size.

Hi @ericvicenti, sorry about the big delay at looking into this one, I'll have a look the next time I have a chance.

@ericvicenti do any of these solutions fix the problem? nodejs/node-v0.x-archive#2479 (comment)

According to nodemon's documentation, on OSX it polls for modified files by repeatedly exec-ing find, which is way different than Hound's approach. Here is the relevant nodemon code.

EDIT: I just realized you raised this issue almost a year ago, whoops.

EDIT EDIT: Hound could use the implementation of fs.watch from https://github.com/medikoo/fs2/. It detects EMFILE errors and automatically switches to using fs.watchFile internally.

Unfortunately, the problem still exists. This was my result today with a 42.7MB folder with 2,322 files.. https://gist.github.com/ericvicenti/3bc2ca55f71cceb9e6c7

Woah, I didn't realize nodemon uses such a hack for this! That explains why it is so slow..

The solution that fs2 uses isn't perfect, but its much less hacky. (and damn, those node APIs are poorly named.. no wonder this is considered unstable) This is a very tricky issue to solve without native support.

I also found this issue with some useful answers on SO: http://stackoverflow.com/questions/8965606/node-and-error-emfile-too-many-open-files

From there I discovered https://github.com/isaacs/node-graceful-fs, maintained by one of the main node developers. The watch does not error out on large directories, because it does not recursively watch every subdir. In OSX, the .DS_Store gets updated when changes occur in subdirs, so graceful-fs.watch catches it:

https://gist.github.com/ericvicenti/53f8253e38cda873ba15

Now the challenge is to figure out exactly what file changed..

My Googling suggest that .DS_Store files are only created by the OSX Finder application when it accesses a directory. If a user is working exclusively with the command line, is the .DS_Store file reliably modified for every file modification deep within the directory structure?

Where does graceful-fs modify the behavior of fs.watch? I didn't see that mentioned in the readme.

Erm, yeah.. you're right. .DS_Store files are only maintained by finder, so that is definitely not a working fix.

You're also correct about graceful-fs not modifying watch.. I got that impression from this SO answer: http://stackoverflow.com/a/15934766

Back to the drawing board. Like I said, this seems to be a very tricky issue, even on a native level..