karrick/godirwalk

Continuation: inquiry on the skipNode functionality in regards to file name filtering

zveinn opened this issue · 5 comments

Hello again,

This is a continuation of a previous issue since I couldn´t re-open the old one again as instructed.

I took a look at the find-fast example and it is indeed what I was looking for. However I was apparently not detailed enaugh when creating the issue.

I had already tried using the filepath.SkipDir but it seems to break the walk if I return that error on a file instead of a directory.

I did some debugging on the walk function and you can view the video here:
https://www.twitch.tv/videos/708174351

I find the (possible) issue @ 56:30 in the video and then I implement a solution that works for me.

However, I obviously don´t know all the logic behind this package and my solution might break something somewhere else so I would love it if you could check it out.

Regards,

Ah, yes. I agree this aspect is frustrating. When I created this library, it was to improve upon some of the problems with the standard library's filepath library, and accordingly, I copied the API and behavior as close as possible to the behavior as the standard library.

Here is a documentation quote from filepath.WalkFunc:

// processing stops. The sole exception is when the function returns the special
// value SkipDir. If the function returns SkipDir when invoked on a directory,
// Walk skips the directory's contents entirely. If the function returns SkipDir
// when invoked on a non-directory file, Walk skips the remaining files in the
// containing directory.

This is therefore why goavro skips the remaining files when it encounters a filepath.SkipDir error for a file entity.

I will take a look at your video and get back to you later. I have an idea how to fix this, and I would not be surprised to learn when I watch the video whether your suggestion matches my gut instinct.

Because file system enumeration by this library will always invoke the callback on a directory before invoking the callback on any of the directory's children, is there a reason that returning filepath.SkipDir when you encounter the directory is not sufficient to prevent enumeration of the directory's children?

The reason I ask is because #53 actually has an example where the code is performing a string contains check for .git which is a common directory name. This would be a prime use case of when one should return filepath.SkipDir from the callback function.

The more I think about it, there are really only two cases. When it is a directory that you want to skip and not descend into, which is allowed by returning SkipDir, and when the directory entry is a non-directory item, in which case simply returning nil is how one skips the entry.

At 42:45 in your video, you seemed surprised that walk would not continue when it received a filepath.SkipDir. The reason it cannot just continue is because the filepath.Walk API says that when Walk receives this token error value, if the current entry is a directory, it should skip that directory, otherwise it should stop processing the rest of the file system entries in the current directory.

At 48:33 in your video you said you do not think it is intended to break the loop, but unfortunately it is the API that filepath.Walk provided, and therefore the behavior that godirwalk.Walk provides.

Here is the documentation from filepath.Walk that this library must emulate:

// processing stops. The sole exception is when the function returns the special
// value SkipDir. If the function returns SkipDir when invoked on a directory,
// Walk skips the directory's contents entirely. If the function returns SkipDir
// when invoked on a non-directory file, Walk skips the remaining files in the
// containing directory.

I am not sure whether you are aware, but you can use conditional compiling to tell godirwalk to print debugging output in your programs:

go run --tags godirwalk_debug main.go --skip .git ../../

I have pushed another branch to come up with a more friendly API that I hope addresses your concerns.

@zkynet, I am going to close this issue since I pushed a new version that has this feature and have not heard back from you. Feel free to comment or open a new issue if you want.