hoytech/vmtouch

Ignoring files / file filter

EtienneBruines opened this issue ยท 11 comments

Hi,

I like vmtouch ๐Ÿ˜„

I was wondering if it would make sense to filter out some directories / filenames? (I mean, make it possible as a command/argument.) Maybe a way to ignore "hidden" files altogether?

Use-case

The use-case I have, is that I have about 15k directories of which 12k belong to git (in directories called .git) - and I don't need all of those in my cache. (Especially since it contains the entire history, and not just the current state)

Alternatives

I've tried using things like find -type d | egrep -v ".git" | xargs -n1 vmtouch -l to achieve this, but due to vmtouch being recursive, it didn't bring much.


Thoughts? (I'd be happy to do some programming if you want to guide me in the direction you want me to take. I'd rather spend time doing something you'll actually want to merge ;-) )

Hi, thanks for the idea!

Just to clarify, you're thinking something like this:

vmtouch -i .git .

And that would ignore any file or directory named .git in the entire nested hierarchy below .?

It's a really good idea, I like it! A pull request would be appreciated.

A few notes off the top of my head:

  • This could partially address some use cases mentioned in #23
  • If people have a large number of ignores, it might make sense to put them into some kind of fast lookup structure such as a hash table (however see the next point)
  • It would be cool to use fnmatch(3) so you could do things like -i '.*' to ignore all files starting with . . Of course this complicates the fast lookup mentioned in the previous point. I think that people will typically use a small number of ignores and the wildcard feature is very useful so just iterating over the ignores is probably good enough.
  • I have an unmerged patch in my inbox to switch the crawl to using nftw(3) instead of a custom crawler. One of the main use cases was for FTW_MOUNT (so you don't crawl across an NFS mount for example). This is why I suggested fnmatch instead of glob(3): It will be easier to integrate the nftw patch if we aren't relying on the glob stuff.

And just another idea:

It would be nice to be able to invert the logic and process all files that do match the patten...

vmtouch -t -i '.*' .    # ignore hidden files
vmtouch -t -I '*.zip' . # only touch zip files

I think I'm going to do it in steps:

  • -i to ignore filenames / directories with that exact name
  • multiple entries - not sure if I should allow -i .git -i .svn, or something like -i .git,.svn. Thoughts?
  • wildcards
  • -I.

Not sure if I'll have enough time for all of those, but at least it's a start.

Thanks! If you run out of time no worries, I can pick it up.

Re: multiple entries, I think the first is preferable (-i .git -i .svn).

Thanks! Looks good, I just added a few minor comments to the commit. If you're busy just let me know and I can fix them up.

Cheers,

Doug

Thank you for the feedback! I have attempted to address everything, except for the "freeing" of memory. I keep getting exceptions when I try to free it. Perhaps that'd be something you can take on.

I think I've done everything I can on this PR (it's not much, but it was fun to help nevertheless). Friendly-reminder: you may want to use the "squash and commit" button instead of merge commits, because I also used a few "fix" commits to process your kind feedback. ๐Ÿ˜„

Thank you very much! I'm sure people are going to find uses for this feature, myself included.

Also it just occurred to me that there is a function for getting the last component in a path: basename(3). I'll probably make it use that.

Thanks again!

Pull request has been merged, and I added another commit to fix some minor things in your patch (most of it just personal style).

Thanks!

I added wildcard support too: f1029b1

Was pretty much just a matter of changing strcmp to fnmatch :)

And here's the -I feature: e5d1311